C# Memory Concerns

Question

I work with cellular automata. My repo for my work is here. The basic structure is

1) A grid of 2) cells, which may have 3) agents.

The agents act according to a set of rules, and typically one designates "states" for the agents (agents of different states have different rules). One (relatively) well-known CA is the game of life.

I'm trying to expand things a bit more and incorporate other types of "properties" in my CAs, mainly to simulate various phenomena (imagine an animal agent that consumes a plant agent, with the plant's "biomass" or what have you decreasing).

To do this I'm incorporating a normal dictionary, with strings as keys and a struct called CAProperty as the value. The struct is as follows:

public struct CAProperty
{
    public readonly string name;
    public readonly dynamic value;
    //public readonly Type type;
    public CAProperty(string name, dynamic value)
    {
        this.name = name;
        this.value = value;
    }
}

(note: previously I had the "type" variable to enable accurate typing at runtime...but in attempts to solve the issue in this post I removed it. Fact is, it'll need to be added back in)

This is well and good. However, I'm trying to do some work with large grid sizes. 100x100. 1000x1000. 5000x5000, or 25 million cells (and agents). That would be 25 million dictionaries.

See the image: a memory snapshot from Visual Studio for a 4000x4000 grid, or 16 million agents (I tried 5000x5000, but Visual Studio wouldn't let me take a snapshot).

On the right, one can clearly see that the debugger is reading 8 GB memory usage (and I tried this in a release version to see 6875 MB usage). However, when I count up everything in the third column of the snapshot, I arrive at less than 4 GB.

Why is there such a dramatic discrepancy between the total memory usage and the size of objects stored in memory?

Additionally: how might I optimize memory usage (mainly the dictionaries - is there another collection with similar behavior but lower memory usage)?

Edit: For each of the three "components" (Grid, Cell, Agent) I have a class. They all inherit from an original CAEntity class. All are shown below.

    public abstract class CAEntity
    {
        public CAEntityType Type { get; }
        public Dictionary<string, dynamic> Properties { get; private set; }

        public CAEntity(CAEntityType type)
        {
            this.Type = type;
        }

        public CAEntity(CAEntityType type, Dictionary<string, dynamic> properties)
        {
            this.Type = type;
            if(properties != null)
            {
                this.Properties = new Dictionary<string, dynamic>(properties);
            }
        }
    }

    public class CAGraph:CAEntity
    {
        public ValueTuple<ushort, ushort, ushort> Dimensions { get; }
        public CAGraphCell[,,] Cells { get;}
        List<ValueTuple<ushort, ushort, ushort>> AgentCells { get; set; }
        List<ValueTuple<ushort, ushort, ushort>> Updates { get; set; }
        public CA Parent { get; private set; }
        public GridShape Shape { get; }
        //List<double> times = new List<double>();
        //System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        public CAGraph (CA parent, ValueTuple<ushort, ushort, ushort> size, GridShape shape):base(CAEntityType.Graph)
        {
            this.Parent = parent;
            this.Shape = shape;
            AgentCells = new List<ValueTuple<ushort, ushort, ushort>>();
            Updates = new List<ValueTuple<ushort, ushort, ushort>>();
            Dimensions = new ValueTuple<ushort, ushort, ushort>(size.Item1, size.Item2, size.Item3);
            Cells = new CAGraphCell[size.Item1, size.Item2, size.Item3];
            for (ushort i = 0; i < Cells.GetLength(0); i++)
            {
                for (ushort j = 0; j < Cells.GetLength(1); j++)
                {
                    for (ushort k = 0; k < Cells.GetLength(2); k++)
                    {
                        Cells[i, j, k] = new CAGraphCell(this, new ValueTuple<ushort, ushort, ushort>(i, j, k));
                    }
                }
            }
        }

        public CAGraph(CA parent, ValueTuple<ushort, ushort, ushort> size, GridShape shape, List<ValueTuple<ushort, ushort, ushort>> agents, CAGraphCell[,,] cells, Dictionary<string, dynamic> properties) : base(CAEntityType.Graph, properties)
        {
            Parent = parent;
            Shape = shape;
            AgentCells = agents.ConvertAll(x => new ValueTuple<ushort, ushort, ushort>(x.Item1, x.Item2, x.Item3));
            Updates = new List<ValueTuple<ushort, ushort, ushort>>();
            Dimensions = new ValueTuple<ushort, ushort, ushort>(size.Item1, size.Item2, size.Item3);
            Cells = new CAGraphCell[size.Item1, size.Item2, size.Item3];
            for (ushort i = 0; i < size.Item1; i++)
            {
                for (ushort j = 0; j < size.Item2; j++)
                {
                    for (ushort k = 0; k < size.Item3; k++)
                    {
                        //if(i == 500 && j == 500)
                        //{
                        //    Console.WriteLine();
                        //}
                        Cells[i, j, k] = cells[i, j, k].Copy(this);
                    }
                }
            }
        }
    }

    public class CAGraphCell:CAEntity
    {
        public CAGraph Parent { get; set; }
        public CAGraphCellAgent Agent { get; private set; }
        public ValueTuple<ushort, ushort, ushort> Position { get; private set; }
        //private Tuple<ushort, ushort, ushort>[][] Neighbors { get; set; }
        //System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        public CAGraphCell(CAGraph parent, ValueTuple<ushort, ushort, ushort> position):base(CAEntityType.Cell)
        {
            this.Parent = parent;
            this.Position = position;
            //this.Neighbors = new Tuple<ushort, ushort, ushort>[Enum.GetNames(typeof(CANeighborhoodType)).Count()][];
        }

        public CAGraphCell(CAGraph parent, ValueTuple<ushort, ushort, ushort> position, Dictionary<string, dynamic> properties, CAGraphCellAgent agent) :base(CAEntityType.Cell, properties)
        {
            this.Parent = parent;
            this.Position = position;
            if(agent != null)
            {
                this.Agent = agent.Copy(this);
            }
        }
    }

    public class CAGraphCellAgent:CAEntity
    {
        // have to change...this has to be a property? Or no, it's a CAEntity which has a list of CAProperties.
        //public int State { get; set; }
        public CAGraphCell Parent { get; set; }
        //System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();

        public CAGraphCellAgent(CAGraphCell parent, ushort state):base(CAEntityType.Agent)
        {
            this.Parent = parent;
            AddProperty(("state", state));
        }

        public CAGraphCellAgent(CAGraphCell parent, Dictionary<string, dynamic> properties) :base(CAEntityType.Agent, properties)
        {
            this.Parent = parent;
        }
    }

so is there a difference between the dictionary key and the CAProperty name? if yes, why? if no, why not a `Dictionary`? — Franz Gleichmann, Feb 19 '20 at 06:55
...no, no difference. Initially I planned for including the Type variable, but perhaps that isn't strictly necessary. I'd intended for this struct to support numeric types as well as bools, but maybe I can limit it to numeric types (for which a dynamic could reliably type them). I'll give that a shot, switch to Dictionary. Thanks. — Eternal Ambiguity, Feb 19 '20 at 23:51
I tried eliminating the CAProperty struct and just going with a Dictionary, but it seems to use about the same amount of memory. — Eternal Ambiguity, Feb 21 '20 at 03:18

Ilmari Karonen · Answer 1 · 2020-02-22T16:21:18.137

It sounds like your problem is that your representation of your agents (using dictionaries) consumes too much memory. If so, the solution is to find a more compact representation.

Since you're working in an object-oriented language, the typical solution would be to define an Agent class, possibly with subclasses for different types of agents, and use instance variables to store the state of each agent. Then your CA grid will be an array of Agent instances (or possibly nulls for vacant cells). This will be a lot more compact than using dictionaries with string keys.

Also, I would recommend not storing the position of the agent on the grid as part of the agent's state, but passing it as a parameter to any methods that need it. Not only does this save a bit of memory just by itself, but it also allows you to place references to the same Agent instance at multiple cells on the grid to represent multiple identical agents. Depending on how often such identical agents occur in your CA, this may save a huge amount of memory.

Note that, if you modify the state of such a reused agent instance, the modification will obviously affect all agents on the grid represented by that instance. For that reason, it may be a good idea to make your Agent objects immutable and just create a new one whenever the agent's state changes.

You might also want to consider maintaining a cache (e.g. a set) of Agent instances already on the grid so that you can easily check if a new agent might be identical with an existing one. Whether this will actually do any good depends on your specific CA model — with some CA you might be able to handle de-duplication sufficiently well even without such a cache (it's perfectly OK to have some duplicate Agent objects), while for others there might simply not be enough identical agents to make it worthwhile. Also, if you do try this, note that you'll need to either design the cache to use weak references (which can be tricky to get right) or periodically purge and rebuild it to avoid old Agent objects lingering in the cache even after they've been removed from the grid.

Addendum based on your comment below, which I'll quote here:

Imagine an environment where the temperature varies seasonally (so a property on the graph). There are land and water cells (so properties on cells), and in low enough temperatures the water cells become frozen so animal agents can use them to cross over between land locations. Imagine those animal agents hunt other animal agents to eat them (so properties on the agents). Imagine the animal agents that get eaten eat trees (so other agents with properties), and tend to eat young saplings (limiting tree growth), thereby limiting their own growth (and that of the carnivore agents).

OK, so let's sketch out the classes you'd need. (Please excuse any syntax errors; I'm not really a C# programmer and I haven't actually tested this code. Just think of it as C#-like pseudocode or something.)

First of all, you will obviously need a bunch of agents. Let's define an abstract superclass for them:

public abstract class Agent {
    public abstract void Act(Grid grid, int x, int y, float time);
}

Our CA simulation (which, for simplicity, I'm going to assume to be stochastic, i.e. one where the agents act one at a time in a random order, like in the Gillespie algorithm) is basically going to involve repeatedly picking a random cell (x, y) on the grid, checking if that cell contains an agent, and if so, calling Act() on that agent. (We'll also need to update any time-dependent global state while we're doing that, but let's leave that for later.)

The Act() methods for the agents will receive a reference to the grid object, and can call its methods to make changes to the state of nearby cells (or even get a reference to the agents in those cells and call their methods directly). This could involve e.g. removing another agent from the grid (because it just got eaten), adding a new agent (reproduction), changing the acting agent's location (movement) or even removing that agent from the grid (e.g. because it starved or died of old age). For illustration, let's sketch a few agent classes:

public class Sapling : Agent {
    private static readonly double MATURATION_TIME = 10;  // arbitrary time delay

    private double birthTime;  // could make this a float to save memory
    public Sapling(double time) => birthTime = time;

    public override void Act(Grid grid, int x, int y, double time) {
        // if the sapling is old enough, it replaces itself with a tree
        if (time >= birthTime + MATURATION_TIME) {
            grid.SetAgentAt(x, y, Tree.INSTANCE);
        }
    }
}

public class Tree : Agent {
    public static readonly Tree INSTANCE = new Tree();

    public override void Act(Grid grid, int x, int y, double time) {
        // trees create saplings in nearby land cells
        (int x2, int y2) = grid.RandomNeighborOf(x, y);
        if (grid.GetAgentAt(x2, y2) == null && grid.CellTypeAt(x2, y2) == CellType.Land) {
            grid.SetAgentAt(x2, y2, new Sapling(time));
        }
    }
}

For the sake of brevity, I'll leave the implementation of the animal agents as an exercise. Also, the Tree and Sapling implementations above are kind of crude and could be improved in various ways, but they should at least illustrate the concept.

One thing worth noting is that, to minimize memory usage, the agent classes above have as little internal state as possible. In particular, the agents don't store their own location on the grid, but will receive it as arguments to the act() method. Since omitting the location in fact made my Tree class completely stateless, I went ahead and used the same global Tree instance for all trees on the grid! While this won't always be possible, when it is, it can save a lot of memory.

Now, what about the grid? A basic implementation (ignoring the different cell types for a moment) would look something like this:

public class Grid {
    private readonly int width, height;
    private readonly Agent?[,] agents;

    public Grid(int w, int h) {
        width = w;
        height = h;
        agents = new Agent?[w, h];
    }

    // TODO: handle grid edges
    public Agent? GetAgentAt(int x, int y) => agents[x, y];
    public void SetAgentAt(int x, int y, Agent? agent) => agents[x, y] = agent;
}

Now, what about the cell types? You have a couple of ways to handle those.

One way would be to make the grid store an array of Cell objects instead of agents, and have each cell store its state and (possibly) an agent. But for optimizing memory use it's probably better to just have a separate 2D array storing the cell types, something like this:

public enum CellType : byte { Land, Water, Ice }

public class Grid {
    private readonly Random rng = new Random();
    private readonly int width, height;
    private readonly Agent?[,] agents;
    private readonly CellType[,] cells;  // TODO: init in constructor?

    private float temperature = 20;  // global temperature in Celsius

    // ...

    public CellType CellTypeAt(int x, int y) {
        CellType type = cells[x,y];
        if (type == CellType.Water && temperature < 0) return CellType.Ice;
        else return type;
    }
}

Note how the CellType enum is byte-based, which should keep the array storing them a bit more compact than if they were int-based.

Now, let's finally look at the main CA simulation loop. At its most basic, it could look like this:

Grid grid = new Grid(width, height);
grid.SetAgentAt(width / 2, height / 2, Tree.INSTANCE);

// average simulation time per loop iteration, assuming that each
// actor on the grid acts once per unit time step on average
double dt = 1 / (width * height);

for (double t = 0; t < maxTime; t += dt) {
    (int x, int y) = grid.GetRandomLocation();
    Agent? agent = grid.GetAgentAt(x, y);
    if (agent != null) agent.Act(grid, x, y, t);
    // TODO: update temperature here?
}

(Technically, to correctly implement the Gillespie algorithm, the simulation time increment between iterations should be an exponentially distributed random number with mean dt, not a constant increment. However, since only the actor on one of the width * height cells is chosen in each iteration, the number of iterations between actions by the same actor is geometrically distributed with mean width * height, and multiplying this by dt = 1 / (width * height) gives an excellent approximation for an exponential distribution with mean 1. Which is a long-winded way of saying that in practice using a constant time step is perfectly fine.)

Since this is getting long enough, I'll let you continue from here. I'll just note that there are plenty of ways to further expand and/or optimize the algorithm I've sketched above.

For example, you could speed up the simulation by maintaining a list of all grid locations that contain a live actor and randomly sampling actors from that list (but then you'd also need to scale the time step by the inverse of the length of the list). Also, you may decide that you want some actors to get more frequent chances to act than others; while the simple way to do that is just to use rejection sampling (i.e. have the actor only do something if rng.Sample() < prob for some prob between 0 and 1), a more efficient way would be to maintain multiple lists of locations depending on the type of the actor there.

See my edit: I actually do have an Agent class. Unfortunately the individual agents cannot be instances because each one is doing independent calculations (based on state-specific rulesets of course), like in [this paper](https://www.researchgate.net/publication/322703355_A_stochastic_cellular_automata_model_of_tautomer_equilibria). — Eternal Ambiguity, Feb 21 '20 at 03:30
I'm not sure *why* you think that "the individual agents cannot be instances because each one is doing independent calculations". In any case, your code (which does not seem complete — you haven't shown what your agents actually _do_) seems way overcomplicated. For one thing, I'm pretty sure that your "CAGraphCell" class is completely useless: just put your agents directly on the grid instead. And I have no idea why you're making all your classes inherit from a common "CAEntity" parent — an agent and a grid are conceptually entirely different things with no meaningful common behavior or state. — Ilmari Karonen, Feb 21 '20 at 03:49
… Also, I took a look at the paper you linked, and as far as I can tell, the "cells" in the "cellular automaton" they describe don't interact in any way at all. If so, all I can say is that the authors are clearly clueless when it comes to cellular automata or stochastic processes in general, and could've obtained equivalent (but much more accurate) results for a tiny fraction of the computational cost just by modelling their system as a three-state [Markov chain](https://en.wikipedia.org/wiki/Markov_chain). — Ilmari Karonen, Feb 21 '20 at 03:57
You're right, that's not all of the code. It would be several hundred more lines over half a dozen classes (abstract and inherited) and structs. I figured it would get things stuck in the weeds. I can give a more detailed overview, but... — Eternal Ambiguity, Feb 22 '20 at 05:16
Imagine an environment where the temperature varies seasonally (so a property on the graph). There are land and water cells (so properties on cells), and in low enough temperatures the water cells become frozen so animal agents can use them to cross over between land locations. Imagine those animal agents hunt other animal agents to eat them (so properties on the agents). Imagine the animal agents that get eaten eat trees (so other agents with properties), and tend to eat young saplings (limiting tree growth), thereby limiting their own growth (and that of the carnivore agents). — Eternal Ambiguity, Feb 22 '20 at 05:19
...ultimately though, the problems I'm facing are: 1. Visual Studio (and the task manager for a release version) shows one amount of memory usage, but the memory snapshot only shows objects for roughly half of that amount, and 2. How would I optimize the objects themselves that I'm using? — Eternal Ambiguity, Feb 22 '20 at 05:21
Okay. I genuinely appreciate your help (and have an upvote for it), but I've clearly done a poor job of explaining myself. I'm interested in simulating Isle Royale, and neuron activity, and chemical equilibria, and diffusion-limited aggregation, and random walks, ALL in the same application. ALL with the same set of classes. — Eternal Ambiguity, Feb 24 '20 at 06:10
See [here](https://github.com/GABowers/CellularAutomataLibrary). The thing is...I already have this working. I haven't gotten into some of these systems yet, but I have the building blocks. My concern was streamlining what I have. — Eternal Ambiguity, Feb 24 '20 at 06:11
So you want to have all the types and properties of your agents (and cells etc.) definable at runtime? Unfortunately, you're going to find that there's no simple way to do that in a memory-efficient manner. Your best bet, if you really want to go that route, is probably to [create classes at runtime](https://stackoverflow.com/questions/3862226/how-to-dynamically-create-a-class) via reflection. — Ilmari Karonen, Feb 24 '20 at 07:29
Okay, well thanks for the help. I guess in that case my only remaining concern is Visual Studio's not showing the usage for half of my memory usage. — Eternal Ambiguity, Feb 25 '20 at 08:50

C# Memory Concerns

1 Answers1