Optimising conways game of life

Question

I'm busy coding Conways Game of Life and I'm trying to optimise it using some data structure that records which cells should be checked on each life cycle.

I'm using an arrayList as a dynamic data structure to keep a record of all living cells and their neighbours. Is there a better data structure or way of keeping a shorter list that will increase the games speed?

I ask this because often many cells are checked but not changed so I feel like my implementation could be improved.

For people not familiar, adding a link to "Conway's Game of Life" might be helpful, or at least interesting. — Jordan, Sep 13 '13 at 19:29
I've seen an implementation with a LinkedList a few years ago that was very fast, but I don't have the details on it unfortunately. It was in an old Sam's programming book. — Kon, Sep 13 '13 at 19:30
I think nodes with 4 pointers to other nodes could be extremely fast — progrenhard, Sep 13 '13 at 19:34
Whatever else you do, [_give this a try_](http://stackoverflow.com/a/378024/23771). Move beyond what most people do, which is to try things blindly and hope for the best. — Mike Dunlavey, Sep 13 '13 at 19:47
You can use a BitSet for marking living cells, which would allow you to skip over 64 inactive cells at once. You can add forward and backward links to you cells and avoid the overhead of the auxiliary List. You can use the idea of Hashlife and forget the low level optimizations. — maaartinus, Sep 13 '13 at 19:56
@Jordan - How could anyone not be familiar with it? Don't you all remember the original article in Scientific American? (I may still have it somewhere, in a stack of magazines out in the garage.) — Hot Licks, Sep 13 '13 at 23:01
@HotLicks I am having trouble determining if you are joking or not O.o — Jordan, Sep 13 '13 at 23:05

Natan Streppel · Accepted Answer · 2013-09-13T20:08:58.417

4

I believe that the Hashlife algorithm could help you.

It gives the idea of using a quadtree (tree data structure in which each internal node has exactly four children) to keep the data, and then it uses hash tables to store the nodes of the quadtree.

For further reading, this post, written by Eric Burnett, gives a great insight about how Hashlife works, it's performance and implementation (although in Python). It's worth a read.

edited Sep 13 '13 at 20:08

answered Sep 13 '13 at 19:43

Natan Streppel

5,759
6
35
43

Why would you need hashtables to store the leaves? Quadtrees can be implemented as "trees" just beautifully, no hashing needed. – Ira Baxter Sep 13 '13 at 19:44
@IraBaxter It's easier to iterate through a hash table then a tree. This is sooo funny i literally though of the same thing! I didn't realize there was an actual data structure and algorithm for it. – progrenhard Sep 13 '13 at 19:46
@progenhard: No. The point is that you can cache the result for whole subtrees, which is a huge gain. – maaartinus Sep 13 '13 at 19:51
I think a quadkey and a monster curve is faster and also reduce the dimension with good spatial quality. – Micromega Sep 13 '13 at 19:53
@HappyYellowFace: well, if the leaves of the quadtree are often identical in content, then yes you could share them with hashing. Given that a leaf is nominally an NxM rectangular region of bits, with 2**(N*M) possible configurations, I'd guess your odds of "identical subregions" was pretty low and thus the hashing buys you nothing. (The quadtree takes care of the regions that are all zero bits by leaving them out of the quadtree). – Ira Baxter Sep 13 '13 at 19:58
@progenhard: If you want easy iteration through the leaves, just link the leaves together in a list. You don't need a hashtable for that particular reason. Frankly, recursing up and down a quadtree just isn't that hard, I doubt I'd bother unless it made a real difference in performance. – Ira Baxter Sep 13 '13 at 20:00
@Ira Baxter: In some cases you're very wrong with your guess: "It takes ~2 seconds to generate the quintillionth generation with 3x10^17 live cells". Sure, for random patterns the gains are much much lower, but still many parts tend to repeat itself (in time or space or both). – maaartinus Sep 13 '13 at 20:17
@maaartinus: Where did you get those numbers? Not from me, certainly not in this comment thread. [maybe you think you got them from my answer but I don't think it implies that, either]. I'm pretty unconvinced that *leaves* of the quadtree ("rectangularly nonblank regions") are likely to contain duplicate patterns. If you choose tiny blocks (e.g., 16 bits) you may end up with some repeating patterns. However, I doubt quadtree in practice sees partitions that small in this application. – Ira Baxter Sep 13 '13 at 21:21
1

@Ira Baxter: No, those number are from [the web](https://www.google.com/search?q=seconds+to+generate+the+quintillionth+generation) and (sort of) disprove what you're saying. There's a wonderful explanation of how it works in [Dr. Dobb's](http://www.drdobbs.com/jvm/an-algorithm-for-compressing-space-and-t/184406478). – maaartinus Sep 13 '13 at 21:44
@Ira Baxter: The very leaves of the quadtree are 1x1 and all but two are duplicates (as there are only two unique instances) The hashlife looks for already computed parts top-down, so if you're lucky you can save a lot of work. Moreover, the bigger a block, the more distant future it allows to save, which may lead to huge speedup. – maaartinus Sep 13 '13 at 21:50
@maaartinus: interesting article. I certainly didn't expect he'd run the quadtree specifically down to 2x2 cells; I assumed one would stop when a partition element was significantly nonempty (e.g., if you have 1000 by 1000 cells that are all live, stop with a 1000x1000 block). Agreed at 2x2 you can gets lot of sharing. (There's only 16 combinations; a pointer to a hashbucket produces a less dense result). After a breif reading, I don't quite understand how he gets bigger blocks shared *above*, I'll have to read more carefully when I have more time. Its clear the author has a clever scheme. – Ira Baxter Sep 13 '13 at 21:58
@maaartinus: He said "quintillion" in 2 secs? That's 10^18 generations in 2x10^9 clock cycles (assuming single core 1Ghz CPU), or 10^9 generations per machine clock. I don't know how to interpret this, but he cannot be representing each generation explicitly; its hard to represent *1* generation in one clock cycle. So it isn't clear what he is counting or claiming. And his remark that for some cases, it takes 25 seconds for 5000 generations implies he does have trouble with irregular data. – Ira Baxter Sep 13 '13 at 22:06
@Ira Baxter: Sure, the algorithm skips generations if possible, see the end of the Dr. Dobb's article. – maaartinus Sep 13 '13 at 22:58

Ira Baxter · Answer 2 · 2013-09-13T22:11:17.827

I built a Life engine that operated on 256x512 bit grids directly mapped to screen pixels back in the 1970s, using a 2Mhz 6800 8 bit computer. I did it directly on the display pixels (they were one-bit on/off white/black) because I wanted to see the results and didn't see the point in copying the Life image to the display.

Its fundamental trick was to treat the problem as one of evaluating a Boolean logic formula for "this cell is on" based on rules of Life, rather than counting live neighbors as is usual. This formula is pretty easy to figure out, so left as a homework exercise. What made it fast was that the Boolean formula was computed on a per-bit basis, doing 8 bits at a time. If you sweep down the screen and across rows, you can in essence evaluate N bits at once (8 on the 6800, 64 on modern PCs) with very low overhead. If you go nuts, you can likely use the SIMD vector extensions and do 256 bits or more at "once". Over the top would be doing this with a GPU.

The 6800 version would process a complete screen in about .5 second; you could watch the update ripple down the screen from top to bottom (60 Hz refresh). On a modern CPU with 1000x the clock rate (1 GHz is pretty easy to get) and 64 bits at a time, it should be able to produce thousands of frames per second. So fast you can't watch it run :-{

A useful observation is that much of the Life world is dead (blank) and processing that part mostly produces more dead cells. This suggests using a sparse representation. Another poster suggested quadtrees, which I think is a very good suggestion. Your quadtree regions don't have to be square, either.

Combining the two ideas, quadtrees for non-blank regions with bit-level processing for blocks of bits designated by the quadtrees is likely to give an astonishingly fast Life algorithm.

Optimising conways game of life

2 Answers2