Optimizing Conway's 'Game of Life'

Question

To experiment, I've (long ago) implemented Conway's Game of Life (and I'm aware of this related question!).

My implementation worked by keeping 2 arrays of booleans, representing the 'last state', and the 'state being updated' (the 2 arrays being swapped at each iteration). While this is reasonably fast, I've often wondered about how to optimize this.

One idea, for example, would be to precompute at iteration N the zones that could be modified at iteration (N+1) (so that if a cell does not belong to such a zone, it won't even be considered for modification at iteration (N+1)). I'm aware that this is very vague, and I never took time to go into the details...

Do you have any ideas (or experience!) of how to go about optimizing (for speed) Game of Life iterations?

see: hashlife, golly and Alan Hensel's java algorithm. – Johan Mar 23 '16 at 01:46 — Johan, Mar 23 '16 at 01:46

score 39 · Answer 1 · edited Jun 20 '20 at 09:12

39

I am going to quote my answer from the other question, because the chapters I mention have some very interesting and fine-tuned solutions. Some of the implementation details are in c and/or assembly, yes, but for the most part the algorithms can work in any language:

Chapters 17 and 18 of Michael Abrash's Graphics Programmer's Black Book are one of the most interesting reads I have ever had. It is a lesson in thinking outside the box. The whole book is great really, but the final optimized solutions to the Game of Life are incredible bits of programming.

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 02 '08 at 20:26

Chris Marasti-Georg

34,091
15
92
137

4

@Chris: Links to byte.com are now dead :( I fixed the links to point to gamedev.net. – Juha Syrjälä Jul 04 '11 at 16:34
Fantastic suggestion, just the inspiration I was looking for (thanked you here too https://mastodon.me.uk/@neil_vass/109710541610977570) – Neil Vass Jan 25 '23 at 13:45

score 19 · Answer 2 · answered Sep 02 '08 at 20:26

There are some super-fast implementations that (from memory) represent cells of 8 or more adjacent squares as bit patterns and use that as an index into a large array of precalculated values to determine in a single machine instruction if a cell is live or dead.

Check out here:

http://dotat.at/prog/life/life.html

Also XLife:

http://linux.maruhn.com/sec/xlife.html

score 15 · Answer 3 · answered Oct 01 '08 at 17:45

15

You should look into Hashlife, the ultimate optimization. It uses the quadtree approach that skinp mentioned.

answered Oct 01 '08 at 17:45

A. Rex

31,633
21
89
96

score 5 · Answer 4 · answered Sep 02 '08 at 20:34

As mentioned in Arbash's Black Book, one of the most simple and straight forward ways to get a huge speedup is to keep a change list.

Instead of iterating through the entire cell grid each time, keep a copy of all the cells that you change.

This will narrow down the work you have to do on each iteration.

score 5 · Answer 5 · answered Jun 28 '11 at 07:46

5

The algorithm itself is inherently parallelizable. Using the same double-buffered method in an unoptimized CUDA kernel, I'm getting around 25ms per generation in a 4096x4096 wrapped world.

answered Jun 28 '11 at 07:46

Owen Knight

51
1
2

score 3 · Answer 6 · answered Jun 02 '16 at 14:13

what is the most efficient algo mainly depends on the initial state.

if the majority of cells is dead, you could save a lot of CPU time by skipping empty parts and not calculating stuff cell by cell.

im my opinion it can make sense to check for completely dead spaces first, when your initial state is something like "random, but with chance for life lower than 5%."

i would just divide the matrix up into halves and start checking the bigger ones first.

so if you have a field of 10,000 * 10,000, you´d first accumulate the states of the upper left quarter of 5,000 * 5,000.

and if the sum of states is zero in the first quarter, you can ignore this first quarter completely now and check the upper right 5,000 * 5,000 for life next.

if its sum of states is >0, you will now divide up the second quarter into 4 pieces again - and repeat this check for life for each of these subspaces.

you could go down to subframes of 8*8 or 10*10 (not sure what makes the most sense here) now.

whenever you find life, you mark these subspaces as "has life".

only spaces which "have life" need to be divided into smaller subspaces - the empty ones can be skipped.

when you are finished assigning the "has life" attribute to all possible subspaces, you end up with a list of subspaces which you now simply extend by +1 to each direction - with empty cells - and perform the regular (or modified) game of life rules to them.

you might think that dividn up a 10,000*10,000 spae into subspaces of 8*8 is a lot os tasks - but accumulating their states values is in fact much, much less computing work than performing the GoL algo to each cell plus their 8 neighbours plus comparing the number and storing the new state for the net iteration somewhere...

but like i said above, for a random init state with 30% population this wont make much sense, as there will be not many completely dead 8*8 subspaces to find (leave alone dead 256*256 subpaces)

and of course, the way of perfect optimisation will last but not least depend on your language.

-110

score 1 · Answer 7 · answered Sep 02 '08 at 20:39

Two ideas:

(1) Many configurations are mostly empty space. Keep a linked list (not necessarily in order, that would take more time) of the live cells, and during an update, only update around the live cells (this is similar to your vague suggestion, OysterD :)

(2) Keep an extra array which stores the # of live cells in each row of 3 positions (left-center-right). Now when you compute the new dead/live value of a cell, you need only 4 read operations (top/bottom rows and the center-side positions), and 4 write operations (update the 3 affected row summary values, and the dead/live value of the new cell). This is a slight improvement from 8 reads and 1 write, assuming writes are no slower than reads. I'm guessing you might be able to be more clever with such configurations and arrive at an even better improvement along these lines.

score 1 · Answer 8 · answered Nov 26 '21 at 13:56

If you don't want anything too complex, then you can use a grid to slice it up, and if that part of the grid is empty, don't try to simulate it (please view Tyler's answer). However, you could do a few optimizations:

Set different grid sizes depending on the amount of live cells, so if there's not a lot of live cells, that likely means they are in a tiny place.
When you randomize it, don't use the grid code until the user changes the data: I've personally tested randomizing it, and even after a long amount of time, it still fills most of the board (unless for a sufficiently small grid, at which point it won't help that much anymore)
If you are showing it to the screen, don't use rectangles for pixel size 1 and 2: instead set the pixels of the output. Any higher pixel size and I find it's okay to use the native rectangle-filling code. Also, preset the background so you don't have to fill the rectangles for the dead cells (not live, because live cells disappear pretty quickly)

score 0 · Answer 9 · answered Sep 02 '08 at 20:21

Don't exactly know how this can be done, but I remember some of my friends had to represent this game's grid with a Quadtree for a assignment. I'm guess it's real good for optimizing the space of the grid since you basically only represent the occupied cells. I don't know about execution speed though.

score 0 · Answer 10 · answered Sep 02 '08 at 20:21

It's a two dimensional automaton, so you can probably look up optimization techniques. Your notion seems to be about compressing the number of cells you need to check at each step. Since you only ever need to check cells that are occupied or adjacent to an occupied cell, perhaps you could keep a buffer of all such cells, updating it at each step as you process each cell.

If your field is initially empty, this will be much faster. You probably can find some balance point at which maintaining the buffer is more costly than processing all the cells.

score 0 · Answer 11 · answered Sep 02 '08 at 20:25

0

There are table-driven solutions for this that resolve multiple cells in each table lookup. A google query should give you some examples.

answered Sep 02 '08 at 20:25

Lasse V. Karlsen

380,855
102
628
825

It'd be interesting to use template meta-programming for the pre-computation, instead of coding it explicitly. – Peter - Reinstate Monica Apr 15 '18 at 10:48

score 0 · Answer 12 · edited Dec 31 '18 at 21:30

I implemented this in C#:

All cells have a location, a neighbor count, a state, and access to the rule.

Put all the live cells in array B in array A.
Have all the cells in array A add 1 to the neighbor count of their neighbors.
Have all the cells in array A put themselves and their neighbors in array B.
All the cells in Array B Update according to the rule and their state.
All the cells in Array B set their neighbors to 0.

Pros:

Ignores cells that don't need to be updated

Cons:

4 arrays: a 2d array for the grid, an array for the live cells, and an array for the active cells.
Can't process rule B0.
Processes cells one by one.
Cells aren't just booleans

Possible improvements:

Cells also have an "Updated" value, they are updated only if they haven't updated in the current tick, removing the need of array B as mentioned above
Instead of array B being the ones with live neighbors, array B could be the cells without, and those check for rule B0.

Optimizing Conway's 'Game of Life'

12 Answers12

Linked