Efficiency of int[,] Arrays Versus Dictionary<> for Constant Updates

Question

I'm creating a playspace in Unity (which is only marginally relevant) in C#, which is ostensibly .NET since it's in Unity. The playspace is two-dimensional and is generated via cellular automata with refining methods which make the shape of the playspace "usable" for my intent.

Currently, data about the base playspace (alive cells or dead, open or closed, pathway or wall) is stored as a binary value in an int[,] array. Eventually, additional data will be stored in the array as well, or potentially in a secondary collection mapped back to that int[,] array.

The additional data will include information about things such as: has a player visited this cell, can the player see this cell currently (LoS), is there an enemy or object/item in this cell.

The LoS, hasVisited, and enemy data will be dynamic, obviously, as a player navigates and as enemies move along their paths.

I would like to design and develop this with performance in mind from the beginning as a misstep early on could cause performance bottlenecks later that might require significant refactoring - wasting precious development time as I near the end of the project.

My question is this:

Does updating an array or multiple arrays to track data become resource-expensive in a linear or exponential manner? In other words, if mapA is [80,80] and mapB is [160,160], is mapB going to consume roughly 4x the resources to maintain, or some greater or lesser amount? Are there benefits to using something like a Dictionary or custom collection if the size grows beyond some value of int[,]?

My initial thought was to use something like Dictionary<int[,], Dictionary<string, string>>. I use dictionaries pretty extensively in a utility I maintain for work, but performance is something I don't consider for that since it's internal and very use-specific, so I'm not incredibly familiar with the performance of Dictionary<> in general. The benefit to the Dictionary over multiple arrays of the same grid with different data is that I can have keys that make sense rather than just having additional values that would need to be tracked. I suspect, however, that the array might simply be faster, while less readable in the long run.

Thanks!

Well, I would do some tests with both of them. In a real scenario, get a timer and lookup the times they'll consume. — Christoph K, Jan 15 '16 at 15:39
try jagged arrays too. http://www.dotnetperls.com/jagged-array — M.kazem Akhgary, Jan 15 '16 at 15:39
Put it like this: I've stored 4 similar sized boxes in a 2x2 square. I now want to store 16 of the aforementioned boxes in a 4x4 square. How much is the increase? In any case: Nothing wrong with keeping performance in mind, but I would urge to put comprehensibility and maintainability first. Then see if you actually *have* a performance problem to begin with. Then see where you can improve. A Dictionary with a two dimensional int array as a key, and dictionary of string - string pairs doesn't sound like something that you're really going to like debugging later on. — Willem van Rumpt, Jan 15 '16 at 15:42
...and keep in mind that doing hash and equality calculations for the multidimensional dictionary key will have a price too. — Willem van Rumpt, Jan 15 '16 at 15:46
Back in the dark ages when I was a student, I had a professor tell us when we were talking about program optimization: "During development, don't do it. After you get the program working, don't do it, yet." Granted getting the best performance can require lots of refactoring, but it's hard to know what to optimize until you've figured out where your program is spending its time. And you can't do that until you've got working code. — Tony Vitabile, Jan 15 '16 at 15:54
@WillemvanRumpt - that's why my base assumption was the 4x increase in computational time dealing with the increase from 80x80 to 160x160. I just wasn't sure if it scaled quite so linearly. — Jesse Williams, Jan 15 '16 at 16:03
@TonyVitabile: And then the next step: Is the performance gain actually worth the loss in readability (that does tend to come with it)? — Willem van Rumpt, Jan 15 '16 at 16:03
@JesseWilliams: In general in computing it does. Reserving space for x items costs x space. Reserving it for 4x items costs 4x space. If it costs as much to process data in said space depends entirely on the algorithms you access it with. Indexed access to x[0,1] is as fast as x[875,232], search performance (for instance) will always deteriorate with increased space (but will stay within its BigO characteristics of course): Searching for your cat takes less time in your house than in your neighborhood. — Willem van Rumpt, Jan 15 '16 at 16:13

score 3 · Accepted Answer · edited May 23 '17 at 12:14

3

According to this answer, a Dictionary is internally a struct array. Therefore, you're adding an additional layer over simply storing them in a multidimensional array on your own. If you're looking for micro-optimization, then you'll never be able to beat the array by piling more on top of it.

Also, the int[,] will at the very least help readability since it is an accurate representation of how the data is visually displayed. So first, make sure you need to optimize before you do. Favor readability first over premature optimization.

I would stick with the int[,] for what you're doing. It will have better performance, and readability.

edited May 23 '17 at 12:14

Community

1
1

answered Jan 15 '16 at 15:46

krillgar

12,596
6
50
86

1

Thank you - that makes sense. And I suppose so long as it's documented, even an additional "duplicate" array with secondary values would be fairly readable. – Jesse Williams Jan 15 '16 at 16:02
1

@JesseWilliams: Go for "very readable", not "fairly readable". "Readable" usually means "readable, because I'm working on the stuff right now so I already spent the time of being aware of all the other dependencies". Readable auto-degrades to "fairly readable" in a month. – Willem van Rumpt Jan 15 '16 at 16:22
1

Hahah yes, I wasn't really talking in degrees so much as comparatively. In my day to day job as a QA Analyst, I get to read a lot of code that is fairly readable at best, and often nearly unreadable, so I'm a big proponent of things making sense. I'm also a commentaholic - if some variable or method or loop function isn't immediately obvious, it will have comment(s), usually plural. – Jesse Williams Jan 15 '16 at 16:34
2

@JesseWilliams: It's one of my pet peeves, so I rarely miss out on an opportunity to get the message through ;). I see a lot of code on a daily basis, and the "creative solutions", "smart implementations" and "optimizations" in 90% of the cases actually fail to introduce *any* benefit, while also in 90% guaranteeing a maintenance nightmare along the road. What got delivered was not bad, not faulty, not functionally wrong, just not an improvement warranting the added complexity. – Willem van Rumpt Jan 15 '16 at 16:46

score 0 · Answer 2 · answered Jan 15 '16 at 16:54

It might depend upon the size of the play space and how sparse it is but I would consider using the dictionary or some other managed structure.

If you have a 1000x1000 play space and you use an array it is going to generate 1,000,000 array-cells even if you only "visit" 10% of them. The dictionary method will use an internal array but it will only grow if it needs to and will start out fairly small. Say 10x10 (not sure, would have to check).

I am assuming that the percentage of visited space may be substantially less than 100%. The dictionary will keep the memory footprint smaller until it needs to be bigger and the overhead of the dictionary is negligible when it gets big.

Well, all of the playspace must be "known" at initialization, but the data about any given cell may be sparse, and non existent for dead cells with a Manhattan distance > 1 from a living cell. — Jesse Williams, Jan 15 '16 at 17:29

Sharundaar · Answer 3 · 2016-01-15T16:52:42.860

-1

UPDATE : As people pointed out in the comments, int[,] seems to be fairly different in C# than C++ int[][] as I assumed, I let the answer as reference. So as you should always prefer readability when performances aren't impaired, use int[,] instead of what I propose below.

I would also add that it is probably more efficient to even go down to a simple int[] to represent a bidimensional area. You can do that easily this way :

int[] myArray = new int[width * height];

int x, y; // suppose you want to reach the cell at (x, y) coordinates
int cell = myArray[x + y * width];

for(int i=0 ; i<myArray.Length ; ++i) // suppose you want to iterate through the array
{
    int x = i % width;
    int y = i / width; // be sure that width is an int here, we want the auto-flooring

    int cell = myArray[x + y * width] // like above ;)

    int cell = myArray[i] // or of course for simplicity
}

As I write this I'm not quite sure [] is more efficient than [,] in C#, but I know it is in C++.

Complete tutorial on the matter : Using Single Dimensional Array to Represent Two Dimensional Data

edited Jan 15 '16 at 16:52

answered Jan 15 '16 at 16:19

Sharundaar

322
1
7

I looked at the link, but I'm failing to see how this is accurate in Euclidean space? Wouldn't [1, 3], [2, 2], and [3, 1] all be calculated as the same cell? – Jesse Williams Jan 15 '16 at 16:26
Oh, actually I do see it. That's interesting, and I definitely never would have thought to do that. The efficiency might be good, but that would not work so well in Unity when actually trying to draw to the grid without writing extension methods to how Unity sees the world space itself. For the raw data, it would be completely usable, but trying to tell the system where things were in that space would be a nightmare, I think. – Jesse Williams Jan 15 '16 at 16:28
I use this representation in almost all my projects so I can assure you it works ;) And you can use it exactly the same way you used your bidimensional array, I could make a one to one use case match between the two representations if you want – Sharundaar Jan 15 '16 at 16:35
@Sharundaar: Then stop (\*) and start using `int[,]` since it is a single-memory-block array just as `int[]` but it grants you all the `+*width` automatically and makes all operations simpler to write and read. (\*)-of course unless you have a very good performance tests proving that you really need to do it by hand - which I guess and doubt you have) – quetzalcoatl Jan 15 '16 at 16:36
@quetzalcoatl Is it ? Isn't it an equivalent to a int[][] in C++ ? I can't find anything related to the subject in C# – Sharundaar Jan 15 '16 at 16:38
@quetzalcoatl - good point. I'm wondering if the performance gain wouldn't be offset by the loss of having to mathematically determine the value from an x-y pair when you need to get information back from a single cell. – Jesse Williams Jan 15 '16 at 16:39
No it is not. int[x,y] is something very different than int[x][y]. First is an array with 2 dimensions. Second is a "jagged array", an "array of arrays", that is, an array of objects of type "int[]" – quetzalcoatl Jan 15 '16 at 16:39
@Quetzalcoatl Yes it is. A C++ `int[][]` is the same thing as a C# `int[,]` a C# `int[][]` is a C++ `(int[])[]` However, @Sharundaar a C++ `int[][]` is also a contiguous block of memory, your comments make it seem you think it is not. Doing a C++ `[4][5]` or a C# `[4,5]` is the same as doing a `[4 * width + 5]` in both languages. – Scott Chamberlain Jan 15 '16 at 16:41
2

@Sharundaar: It's my understanding that since the memory space inside a computer is linear, the calculations you've provided in your code are exactly the same calculations the array indexer performs, so I don't think you're buying anything at all, but you're losing readability. – Tony Vitabile Jan 15 '16 at 16:41
Correction on my last comment, a C# `int[][]` is most like a C++ `*arr[]`. but a C++ `int[][]` is still the same as a c# `int[,]` – Scott Chamberlain Jan 15 '16 at 16:47
For all practical purposes, this is never going to be an improvement. – Willem van Rumpt Jan 15 '16 at 16:48
1

Updated my answer to all intents and purposes – Sharundaar Jan 15 '16 at 16:53
1

@ScottChamberlain: Yes, C#'s int[][] is `int* arr[]` (easy to mistake with for `int(*arr)[]` which actually is quite not representable in C#, except for int[,] which is not a 100% match), but there's not much sense in bringing here the C++ typesystem? When I wrote that comment, Sharundaar didn't claim that code was in `C++`, and the **question** is about C#, not a word on C++. If you look carefully, theres `int[]` so that code is C#. Therefore, I was speaking about C# only and I don't assume the OP knows intricacies of C++, which are much harder to follow than C#'s. – quetzalcoatl Jan 16 '16 at 08:52
@TonyVitabile: there actually is one very small performance gain (or loss, depends on things..) in "linearizing" it yourself. C# compiler injects checks to throw ArrayIndexOutOfBounds. Doing it manually let you skip that checks, hence small gain. However, in many cases you'd like to have them and writ them anyways. Also, I've read that in some cases C# JIT is able to detect a safe loop and eliminate these checks, so .. that tiny gain is not guaranteed to happen. That's why I said it not worth the effort, unless one has a very good cause and proof. – quetzalcoatl Jan 16 '16 at 09:01
@Quetzalcoatl; Taking out the check for index out of bounds can actually make your code even more prone to crashes and failures. Sorry, but I'd stick with the built-in indexers. – Tony Vitabile Jan 18 '16 at 14:49
@TonyVitabile: I was only referring to your comment `the calculations you've provided in your code are exactly the same calculations the array indexer performs, so I don't think you're buying anything at all`, that's all. I don't advise doing that, I have no idea where did you got such impression. – quetzalcoatl Jan 18 '16 at 15:04

Efficiency of int[,] Arrays Versus Dictionary<> for Constant Updates

3 Answers3