Sunday, May 4, 2025

Unexpected Gotchas in Making a Game Deterministic

When aiming for a fully deterministic program, it is common knowledge that you have to deterministically seed your random number generators, be careful about multithreading, and the floating point computations.

In this post I want to highlight a few less commonly mentioned pitfalls I encountered when making my game deterministic.

Foreshadowing

My game targets iOS/Android/Linux/Windows/WebAssembly, meaning it must remain deterministic across:
  • different compilers and compiler options.
  • different standard library implementations (STLs).
  • different architectures.
Lets get started!

Random numbers

If you want deterministic random numbers, you just have to deterministic seed your generator and you are good to go, right?
Well... the following code contains 2 different errors that break determinism (try and spot them):

#include <random>


int get_random_number() {

    int64_t seed = 9876543210;

    std::mt19937 rng(seed); // a generator based on mersenne twister

    std::uniform_int_distribution<int> dist(0, 10); // to get integers in [0, 10]

    return dist(rng);

}

1. uint_fast32_t

On older Android devices, my game produced different random numbers. I traced the issue to the number generator itself.
The issue? The std::mt19937 implementation uses uint_fast32_t, which becomes uint32_t ouint64_tdepending on the architecture:

typedef mersenne_twister_engine<uint_fast32_t, 32, 624, 397, 31,

                                0x9908b0df, 11, 0xffffffff,

                                7,  0x9d2c5680,

                                15, 0xefc60000,

                                18, 1812433253> mt19937;

The fix was replacing the uint_fast32_t with an uint32_t.

In general, if you are targeting 32 bit platforms, beware of uint_fast32_t!

2. Standard Library Distributions

Now you actually have a deterministic generator and everything works well... until one day you update your compiler (and STL) and suddenly determinism is broken again.

You trace it back to STL's std::uniform_int_distribution. This makes sense! There are multiple ways of generating a uniform distribution from a random number generator.
To be precise, from what I've seen the STLs all use the same algorithm, but the exact implementations can and do vary.

In my case I discovered the problem when porting the game to Linux. I solved the problem by using my own distribution implementation.

Sorting

std::sort is not deterministic if you are sorting elements with identical values.

For example if you sort the strings {"BB", "A", "CC"} according to their length, you'll get either
{"A", "BB", "CC"} or {"A", "CC", "BB"} depending on which implementation of std::sort you are using.

That's why std::stable_sort exists. 

By now a pattern becomes clear: you have to be extra careful when using the STL (or any piece of code for which you don't easily control the version).

But the STL is not the only source of gotchas!

Order of evaluation of parameters

The same version of clang is evaluating the order of parameter differently on Windows than on the other platforms!
Say you want to generate a random coordinate in a 10x10 square and you write the following code:

  return std::pair<int64_t>(game.rng.rand(0, 10), game.rng.rand(0, 10));

You may get {2,8} on linux, but {8,2} on Windows!

While this family of bug happens whenever you call multiple impure functions, in practice I encountered this bug twice, and it was always when passing multiple random numbers to a function.

Pointers

At one point I was using an std::set of pointers and I noticed that when iterating over this std::set I was not always visiting the objects in the same order.
In hindsight the reason is obvious: the memory allocator is not deterministic (*), so the std::set will not always contain the same pointers, so the order will not always be the same.
The lesson here is to never compare pointers or do calculations based on their value, which can happen implicitly when storing pointers in the various C++ containers (std::unordered_setstd::map, ...)

(*) Yes, under some circumstances you can create a deterministic allocator.

Enforcing limits on memory usage

That's the one point where I'm aware that my game is not deterministic.

In my game I've restricted the memory consumption of the Lua interpreter: the interpreter returns an error when a user-provided script uses too much memory.

I would like this error to occur deterministically.

Unfortunately that is not possible because the memory usage of the Lua interpreter is not deterministic:
A given structure will not be the same size on every platform (e.g. because the pointer size is different, because the alignment requirements are different, because the STL is different, because of a compiler flag, etc...).

The lesson here is that if you want to deterministically enforces limits on memory consumption, you need to work at a higher level of abstraction than just the allocation size.
Deterministically limiting memory usage is a niche requirement though and I am not aware of any library that does that, let alone any interpreter.

More blog posts on determinism

Friday, February 9, 2024

PewPew Live's look in a nutshell

Occasionally someone will asked how I obtained the PPL look.
In a nutshell:

  • Draw everything with lines, including the text and the various icons.
    It's a lot of work, but besides looking unique it creates a consistent appearance which is a thing that a lot of indie games struggle with.
    The lines are screen-space projected lines with miter joins.
  • Draw the lines with additive rendering. This means that if a red and green line overlap, the overlap will be yellow.
    There are a few things not drawn with additive rendering (like the background of buttons to improve readability), but they are exceptions.
  • Add bloom.
    There's lots of different bloom implementations. Nowadays I use a bloom that is similarly to the one in blender's eevee.
    If you see banding, use dithering.
  • Optional: Add even more post-processing like (very slight) chromatic aberration, lens dirt, scan lines, curved monitor, and vignette.
No post-processing, just lines

Bloom! Ignore the missing bloom at the top

All the post-processing turned on


One final thing that is pretty neat in PPL (which is not often seen in mobile games) is that you can configure every single effect, which allows you to see how the individual effects contribute to the final look:

Saturday, February 27, 2021

A general state rollback technique for C++

I wanted to write this post for a while. It describes a C++ technique to implement rollback in the context of multiplayer games that I feel is quite interesting and useful.

The tl;dr is: don't bother serializing individual objects, just rollback all the memory.

Rollback-based multiplayer

I've been working on a multiplayer version of PewPew, and for reasons that are outside of the scope of this post, I chose to implement multiplayer with deterministic lockstep and rollback.

The basic idea behind rollback-based multiplayer is that the inputs of players are replicated to all the players.
Whenever a player receives the inputs of another player, the state of the game is rolled back to the point where the input happened and fast-forwarded back to the present so that the state shown to a player takes into account the inputs of the other players. Because history is being re-computed, some events get undone.
For example, it's possible a player saw themselves taking a bonus, but after receiving the inputs of the other player, the other player ends up actually taking the bonus before. In practice this is not a big deal: we only rollback as much as the network latency require, which should be less than 100 milliseconds, meaning incorrect states are only shown for a short time. If this is still a problem, there are mitigation techniques such as delaying the visual or audio feedback of events.
Ars Technica did a great job of talking about this approach and its implications.

My first attempt at implementing the rollback would serialize the state of every object in the game and deserialize them whenever rollback occurred. The code for serializing and deserializing every object was slow but worked well (video of a demo).

Serializing is *the* challenge of rollback-based multiplayer

A little bit later in the development process, I started using a Lua interpreter to run the script that describes what should happen in a level.
The interpreter's state changes based on the input of the players and can trigger events. For instance, the script may count the number of enemies destroyed, and spawn a bonus once that number reaches a certain threshold.

Since the Lua interpreter is storing a piece of the Game State, whenever the Game State needs to be rollbacked, the state of the Lua interpreter needs to be rollbacked as well.

The problem is that a Lua interpreter is not easily rollback-able because there's no simple way to serialize the state of the interpreter (the interpreter's state has pointers to game objects).
I considered dropping the interpreter and implementing the gameplay logic in C++, but even then serializing the gameplay logic was going to painful and error-prone.

A solution: Rollback ALL the memory that contains the state of the game

A completely different approach is to treat the game code (including the interpreter) as a black box and only consider the memory that was allocated by the game code.
You can take a snapshot of the Game State by making a copy of all the memory that was allocated for it.
When you want to rollback to a past state, you copy the snapshot back to the location in memory where it was taken.

To achieve that you need to separate the code that updates the state from the other systems, including the rendering and the UI. However, once this separation is in place you don't have to worry about the serialization any more as it happens transparently.

To emphasize why this technique is great, I'll quote Ramón Franco on his experience working on Killer Instinct, a game which uses rollback for multiplayer (source):
Also, any time a new element is added to the game, it has the possibility of affecting your game state and making it so rollbacks could cause a desync. There’s no getting around that. When Keits comes to me and says "hey, I want these projectiles to be random and sometimes four of them come out and they loop around in odd patterns," these are now elements I have to make sure can be rolled back, including adding new elements to the code. I can train programmers to make sure their variables are registered with our rollback system, but it’s unavoidably more work.

Well Ramón, there is getting around that. If you rollback all the memory, you don't have to think about making sure every individual variable gets properly serialized anymore!

Implementation details

Efficient snapshotting

In C++, intercepting the memory allocations can be done by overriding the new operator and its variants. Tracking the allocations of the code to take snapshots is then simple, but taking a snapshot of all the allocations will be slow because the allocations may not be continuous, and could require as much as one memcpy per allocation.

A much more efficient solution is to use a custom memory allocator that is designed for efficient snapshotting. 
By using a custom memory allocator, you don't have to maintain the tracking of the allocated pieces yourself, you can just read the data structure maintained by your memory allocator. You can also design the memory allocator so that it focuses on having a continuous chunk of allocated memory to reduce the numbers of memcpys, and have the memcpys be done in a cache-friendly way.

Note that because the snapshot contains pointers that point to locations inside the snapshot, when the times comes for the snapshot to be restored it needs to be copied back to the same memory locations it was copied from. A simple solution is to have the allocator allocate memory from a fixed chunk of memory. This guarantees that we will always be able to copy back the snapshot's content to the memory location it was taken from, ensuring that the pointers it contains will be valid.

I call an allocator that allows taking snapshot and restoring them a SnapshotableAllocator.
For the record, my implementation of a SnapshotableAllocator in use in PewPew Live is available here.

Coexisting with the rest of the code

Only the memory that needs to be rolled-back should be allocated by the SnapshotableAllocator. The rest of the memory (the textures, the 3D models, the audio, the UI, etc...) should be allocated by your usual non-snapshotting allocator.
The new operators must therefore be able to switch allocator at runtime. Moreover, the configuration of which allocator to use must be on a thread-by-thread basis.
In my implementation, a thread_local pointer stores which allocator to use. The new operators use that thread_local pointer to choose which allocator to use.

In order to tell the compiler that a section of the code needs to use a different allocator, I find that using RAII is convenient. When I want to wrap a section of the code with a different allocator, I create a ScopedAllocatorUsage object which changes the thread_local pointer in its constructor. When the ScopedAllocatorUsage goes out of scope, its destructor restores the thread_local pointer to the original value.

Example

The following example creates a string, takes a snapshot, changes the content of the string, uses the snapshot to rollback the content of the string, and finally destroys the string.

Notice how the initialization, updates, and destruction of the string are in blocks that start with ScopedAllocatorUsage.

  SnapshotableAllocatorImpl allocator(8);

  std::unique_ptr<std::string> s;


  // Initialize the string with "hello"

  {

    ScopedAllocatorUsage sau(&allocator);

    s = std::make_unique<std::string>();

    *s = "hello";

  }


  // Take a snapshot

  auto snapshot = allocator.TakeSnapshot();


  // Change the value of the string to "world"

  {

    ScopedAllocatorUsage sau(&allocator);

    *s = "world ";

  }


  // Verify the value of the string before and after the rollback.

  assert(*s == "world ");

  allocator.LoadMemoryFromSnapshot(*snapshot);

  assert(*s == "hello");


  // Free the memory

  {

    ScopedAllocatorUsage sau(&allocator);

    s.reset();

  }


Also notice how the SnapshotableAllocator is completely oblivious to what data structure it is taking snapshots of.

Memory usage

The game being deterministic, the game offers replays. Replays store the initial seed for a game and all the player inputs that occurred, making it possible to replay the game.
Those replays are used to verify the validity of scores, but they are also used by the players to watch and learn from the best players.
Those replay only allow you to simulate the game forward, but we can re-use the rollback system to provide rewind capabilities.

In PewPew Live, rewind is implemented by taking a snapshot anytime you've simulated an additional 0.25% of the replay. This still means that by the end of the replay, 400 snapshots will be kept in memory.

In my case, the snapshots take around 60KB, and an individual snapshot can be compressed down to 25KB with deflate, assuming the SnapshotableAllocator zeroes the memory before handing it out. If you don't plan on compressing individual snapshots, there's no point in zeroing the memory.
Delta-compressing two successive snapshots would likely yield far better compression ratios, but I haven't tried that.

Persisting the snapshots (disclaimer: I haven't actually tried this.)

The snapshots can be restored only as long as the SnapshotableAllocator has been alive, because the application needs to be able to copy the snapshots back to the location where they were taken.
This limitation can be worked around by having the SnapshotableAllocator use memory obtained at a fixed location with mmap.

For example, the following magic incantation allocates 128KiB at the address 0x800000000.

  void* ptr = mmap((void*)0x800000000,

                   4096 * 32,

                   PROT_READ | PROT_WRITE,

                   MAP_ANONYMOUS | MAP_PRIVATE,

                   -10); 


Using this trick, the snapshots can be persisted to disk, for instance, to persist the player's progress across launches of the game.
The snapshot could even be exchanged over the network, assuming the receiving side has the same endianness, the same pointer size, is running the same binary, and can mmap the same memory location.
In the general case, having the same representation of floating-point numbers on both sides is vital, but in my case not necessary as I'm not using floating-point numbers to store anything important in the game state.
I wouldn't be surprised if it was possible to exchange snapshots between arm64 and x64!

A side benefit of having an allocator distribute chunks of memory from a fixed location is that it makes the allocator deterministic, which can help make the application deterministic. Determinism is a big topic that I will one day elaborate on in a separate blog post.

Pitfalls

By far the biggest pitfall of this technique is not respecting the separation between the Game State and the rest of the game.
For instance, consider the case of an std::map that contains the achievements of the player that is instantiated at the launch of the application. If in the middle of a game the code that updates the Game State tries to insert something in that std::map, the game will crash.
That's because the std::map was allocated with the default allocator, and the game code will try to modify the std::map while the SnapshotableAllocator is active.
Obviously this example was not made up out of thin air, it actually happened to me.

Another manifestation of this bug happens if you declare and use static variables in a function that is called by the code that updates the Game State.
The static variable will be initialized with the SnapshotableAllocator, which means that the static variable is valid only during the lifetime of the SnapshotableAllocator.
This bug is more insidious for two reasons:
1/ Code that does not have any pointers to the "outside" may still create a crash.
2/ The crash won't happen immediately.

Fortunately those errors are detected by ASAN.

Conclusion

Even-though I stumbled upon this technique because of a black box that I couldn't easily serialize, it turned out to be extremely convenient. It is also possibly more efficient than regular serialization.
There are some pitfalls to deal with when setting up the game engine, but afterward, it is completely transparent. I don't have to think about the fact that a different allocator is used (except for static variables). Whatever state and logic I add, whether in C++ or Lua, it will be properly rolled back.
And while the technique was developed to implement multiplayer, it is also used to provide rewind capabilities for replay, and I am even considering using it to add time travel elements to the gameplay (video demonstrating time travel).

Monday, October 19, 2020

Abusing constexpr to implement gettext's _() macro with zero overhead.

You usually use the C library gettext with a _() macro that returns a localized version of the text passed in parameter.

For example:

puts(_("Hello"));

will result in a runtime search of the string "Hello", and _() eventually returns the localized version of the string. I assume the search is implemented using some sort of binary search, so it's probably extremely fast.
It's fast, but we can get rid of the search altogether using constexpr functions!

The trick is to use a constexpr function that at compile time gets the ID of the given string. Then it's a matter of doing an array lookup in the table of strings for the desired locale. The array lookup is still unavoidable because we don't know at compile time which locale the app will use.

In my case, _() is implemented using I18NStringIndex which looks like this:

constexpr bool I18N_EQ(char const * a, char const * b) {
 return std::string_view(a)==b;
}
constexpr int I18NStringIndex(const char* s) {
 if (I18N_EQ(s, "off")) { return 0; }
 if (I18N_EQ(s, "on")) { return 1; }
 if (I18N_EQ(s, "ok")) { return 2; }
 if (I18N_EQ(s, "Quick pause")) { return 3; }
 if (I18N_EQ(s, "Quick pause %s")) { return 4; }
 if (I18N_EQ(s, "Continue")) { return 5; }
 if (I18N_EQ(s, "Exit")) { return 6; }
 if (I18N_EQ(s, "Pause")) { return 7; }
 if (I18N_EQ(s, "Restart")) { return 8; }
 if (I18N_EQ(s, "Game Over")) { return 9; }
 if (I18N_EQ(s, "Play again")) { return 10; }
 if (I18N_EQ(s, "Cancel")) { return 11; }
 if (I18N_EQ(s, "Game is full")) { return 12; }
 if (I18N_EQ(s, "Local lobby")) { return 13; }
 if (I18N_EQ(s, "Ready")) { return 14; }
 if (I18N_EQ(s, "Waiting for other players...")) { return 15; }
 etc...

It is brutal, but the compiler still manages to transform I18NStringIndex("Game Over") to 9, and without noticeably slowing down the compilation!

It is used together with static arrays of strings, such as this one:

std::array<const char*, 138> kStrings_FR = {
"off",
"on",
"ok",
"Mini-pause",
"Mini-pause %s",
"Continuer",
"Quitter",
"Pause",
"Recommencer",
"Game Over",
"Rejouer",
"Annuler",
"Partie déjà complète",
"Partie locale",
"Je suis prêt !",
"En attente des autres joueurs...",
etc...

I18NStringIndex is generated by going over the code with a regex looking for patterns like _(".*"), and kStrings_LANG is generated using the .po files the translators filled.

One danger of this technique is that if for whatever reason the compiler can't run I18NStringIndex at compile time, then you pay a heavy price at runtime. Fortunately in C++20 you can specify the function to be consteval to make sure it does not happen.

Another downside of this technique is that whenever I18NStringIndex is re-generated, all the files containing _() need to be recompiled.
It's not a problem for me because I re-generate that function only when I'm about to send the .po files to the translators, which does not happen often.

Thursday, September 10, 2020

Exponential = dangerous

Note: This post was initially written in 2013, right after the release of Pacifism. For some reason I never published it. It's still relevant today, so here it is seven years later.

In PP2's pacifism mode, you can blow up enemies with bombs. I wanted to encourage people to take risks and not just blow up all the bombs they see, so I decided to give players an incentive to explode large amounts of enemies at the same time. To do that, I crafted a formula that gave players bonuses that exponentially grew along the number of simultaneous kills:

x=number of enemies killed.  f(x)=the bonus for x enemies killed.

I made sure that the exponential grew slowly: even if a player managed to be twice as good as me and destroy twice as many enemies as me (200), they would only make around 55000 points, which is high but not absurdly so.
Once I was pleased with the feeling of the game, I released it and waited for the scores to come in.
As the high scores started coming in, I got to see the replays of people playing the game mode for the very first time and figuring out how to play. That was really cool.
Quickly though, you could see in the replays the players getting better and better. 

And eventually, some players realized that by using the fastest ship, you could simply circle around the level dodging the bombs for a minute with the enemies never catching up with you. When they did blow a bomb, several hundreds enemies would explode resulting in bonuses of hundred of millions of points.


Eventually what had to happen happened, and people started scoring more than 2 billion points, which is a bad thing when you store your scores in 32 bit signed integers.

The lessons I learned:
  • Use 64 bit integers for scores.
  • Put a cap to exponential scores, or at least be very careful. You never know what the player are going to.
  • Be ready to reset the scores and prevent people with old version of the game to send "bad" scores.
  • Replays are so useful, it's ridiculous. Without them, I would have had no idea what the players were doing, and I probably would have assumed they were cheating and sending fake scores.

Saturday, June 20, 2020

PewPew Live released!

PewPew Live was released today on Android!

Let's go over what's new compared to PewPew 2:

  • LAN Multiplayer

Multiplayer is the main difference with PewPew 2, and the reason PewPew Live exists in the first place. The initial plan was to support online multiplayer, but this proved to be very complex. For now, the game will only support LAN.

  • Support for custom levels

Custom levels are another big new thing. Allowing users to create levels will increase the replayability of the game.
I had to choose where on the spectrum would the creation be: should the players be given building blocks that they can remix (low barrier of entry, low variety of levels), or should the players have to directly write code (high barrier of entry, high variety of levels) like I do when creating levels? I chose the latter because that's what I would have liked as a player, it introduces people to programming and tools that lower the barrier of entry can always be built on top.

  • New game modes
  • Unlock-able ships, trails, bullets
  • Improved graphics

There are also numerous under-the-hood improvements.
The original PewPew was started around 2009, at the beginning of the smartphone era. Many assumptions made back then do not make any sense in today's landscape. On top of this, the original game's feature set grew organically without much planning. I took PewPew Live as an opportunity to fix everything, including:

  • Support for high-refresh-rate screens

PewPew Live supports refresh rates higher than 60 fps because the rendering system can interpolate between game states.

  • Abstraction of the rendering API

PewPew 2 was directly using OpenGL. PewPew Live has an abstraction over the rendering API that improves portability. For example, using directly Vulcan should be possible.
Shader-based rendering
Now that shaders are available on almost all mobile devices, PewPew Live can depend on them for animations.

  • Resizable window support

The UI now dynamically adapts to the size of the window.

  • Deterministic replays

I had retrofitted determinism in PewPew 2, but there were some bugs that I never solved. Now the game is designed from the start to be deterministic. This should make it harder to cheat.

  • An overall better game architecture of the game

It gives me more flexibility to do weird things. E.g. you can run multiple instances of the game simultaneously.

  • Editor

In PewPew 1 I was editing SVGs in Adobe Illustrator and exporting them to PewPew. This time around I invested time to make a tailored editor for graphics and level. This should lead to higher quality models and levels.

  • Better collision detection for walls

In PewPew, a single wall was a quadrilateral with a width. This made editing levels painful, and if the wall was too thin compared to the velocity of an entity, collisions would be missed. This time around, the walls are actual lines and collisions can't be missed.

  • Sound synthesizing

In PewPew sound was made of .wav files. In PewPew Live, the sounds are synthesized at runtime. This allows more flexibility in the sounds and takes less space. It does limit the kind of sound effects the game can have, but the upside is that it makes the sound effects more consistent.

Tuesday, February 18, 2020

Ridiculously cheap depth of field effect for lines

I'm working on PewPew's sequel, for which I've revamped the graphics.
Instead of drawing lines directly using OpenGL, each individual line segment is made up of two triangles whose vertexes are computed with shaders. Getting lines in 3D space to be properly displayed on a 2D screen is not trivial. In PewPew's sequel I use the screen-space projected lines, a technique very well described in the Drawing Lines is Hard post.

The upside of drawing the lines yourself is that you are fully in control, which allows you to implement nice things such as joints, perspective, and even simulate depth of field.

https://en.wikipedia.org/wiki/Depth_of_field

Usually depth of field (DoF) in video games is implemented using a post-processing step that blurs the pixels with an intensity that is a function of the depth of the pixels. When we are rendering lines, we can approximate DoF directly when rendering the lines by having the vertex shader increase the width of lines and reduce their opacity proportionally to the distance they are from the plane of focus.
Left: without DoF
Right: with DoF

On top of that, we can make the lines appear blurry by adding a gradient along the lines in the fragment shader.

Left: DoF with sharp lines
Right: DoF with blurred lines

As long as there are not too many overlapping lines, the whole effect is very cheap in terms of processing power and runs very well on mobile devices.
In addition to that, it also properly handles transparency (unlike standard post-processing based DoF), which in the case of PewPew is very important since almost everything is rendered with additive blending.

Possible improvement

The technique does not work well when one end of a line is in the far plane while the other end is in the near plane. In that case, because the shader interpolates between the two ending of the line, the center will be wide and with a low opacity while it should actually be narrow and crisp.
I reduced the impact of this problem by splitting the geometry.

Unexpected Gotchas in Making a Game Deterministic

When aiming for a fully deterministic program, it is common knowledge that you have to deterministically seed your random number generators,...