16

I am using a Dictionary<Int,Int> to store the frequency of colors in an image, where the key is the the color (as an int), and the value is the number of times the color has been found in the image.

When I process larger / more colorful images, this dictionary grows very large. I get an out of memory exception at just around 6,000,000 entries. Is this the expected capacity when running in 32-bit mode? If so, is there anything I can do about it? And what might be some alternative methods of keeping track of this data that won't run out of memory?

For reference, here is the code that loops through the pixels in a bitmap and saves the frequency in the Dictionary<int,int>:

Bitmap b; // = something...
Dictionary<int, int> count = new Dictionary<int, int>();
System.Drawing.Color color;

for (int i = 0; i < b.Width; i++)
{
    for (int j = 0; j < b.Height; j++)
    {
        color = b.GetPixel(i, j);
        int colorString = color.ToArgb();
        if (!count.Keys.Contains(color.ToArgb()))
        {
            count.Add(colorString, 0);                
        }
        count[colorString] = count[colorString] + 1;
    }
}

Edit: In case you were wondering what image has that many different colors in it: http://allrgb.com/images/mandelbrot.png

Edit: I also should mention that this is running inside an asp.net web application using .Net 4.0. So there may be additional memory restrictions.

Edit: I just ran the same code inside a console application and had no problems. The problem only happens in ASP.Net.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
Rafe
  • 8,467
  • 8
  • 47
  • 67
  • 6
    What is the range of the keys? If the populated keys are sufficiently dense, then why not just use the key as an index into an int array, whose values are the counts? If the counts are small enough, then an array of int16 might suffice. – John Saunders Dec 17 '13 at 19:44
  • 1
    Do you actually need to store all colors? Perhaps, cleaning up and storing only most frequent would suffice? I don't remember off top of my head, but I think the limit you found is right. Typically, however, if you've outgrown ~10000 entries, you need to be using a different methodology instead. – Victor Zakharov Dec 17 '13 at 19:49
  • I agree with Christopher, why not round up the colors to a nearest five, or ten. That being said, are you using tasks by any chance? Each task is about a meg in memory. You can easily run out of memory if you spin up a task for each pixel. – Dimitri Dec 17 '13 at 19:49
  • Where are you actually receiving the OOM exception? Is it on a specific pixel? GDI will throw this occasionally for other unrelated issues... – Reed Copsey Dec 17 '13 at 19:53
  • @ReedCopsey I initially thought it was coming from b.GetPixel(), but then I noticed the debugger had the exception at the count.Add(). I don't think it is a GDI thing. – Rafe Dec 17 '13 at 19:57
  • @JohnSaunders I believe the range would be 0 to 4 billion. That's a big int array! – Rafe Dec 17 '13 at 20:18
  • 1
    @Rafe There are 4 billion possible 32-bit ARGB values, but there are only 16 million possible 24-bit RGB values. You probably don't want to represent alpha in the histogram anyway. – Chris Culter Dec 17 '13 at 20:29
  • 1
    I have edited your title. I recognize that this problem is specific to ASP.NET, but please keep that metadata out of the title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders Dec 17 '13 at 20:37
  • Not related to your dictionary problem, but you might consider looking into using [Bitmap.LockBits](http://msdn.microsoft.com/en-us/library/5ey6h79d(v=vs.110).aspx) to get a pointer to the raw image data. That will be a whole lot faster than calling `GetPixel` for each pixel in the bitmap. – Jim Mischel Dec 17 '13 at 21:09
  • @JimMischel Thanks. I did look into that. I have a medium trust requirement though, which rules out using lockbits to access the memory data directly. – Rafe Dec 17 '13 at 21:11
  • 1
    This answer is also good: http://stackoverflow.com/questions/3521532/how-is-the-c-net-3-5-dictionary-implemented – Ahmet Kakıcı Dec 22 '13 at 20:17

5 Answers5

11

Update: Given the OP's sample image, it seems that the maximum number of items would be over 16 million, and apparently even that is too much to allocate when instantiating the dictionary. I see three options here:

  • Resize the image down to a manageable size and work from that.
  • Try to convert to a color scheme with fewer color possibilities.
  • Go for an array of fixed size as others have suggested.

Previous answer: the problem is that you don't allocate enough space for your dictionary. At some point, when it is expanding, you just run out of memory for the expansion, but not necessarily for the new dictionary.

Example: this code runs out of memory at nearly 24 million entries (in my machine, running in 32-bit mode):

Dictionary<int, int> count = new Dictionary<int, int>();
for (int i = 0; ; i++)
     count.Add(i, i);

because with the last expansion it is currently using space for the entries already there, and tries to allocate new space for another so many million more, and that is too much.

Now, if we initially allocate space for, say, 40 million entries, it runs without problem:

Dictionary<int, int> count = new Dictionary<int, int>(40000000);

So try to indicate how many entries there will be when creating the dictionary.

From MSDN:

The capacity of a Dictionary is the number of elements that can be added to the Dictionary before resizing is necessary. As elements are added to a Dictionary, the capacity is automatically increased as required by reallocating the internal array. If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the Dictionary.

Community
  • 1
  • 1
Julián Urbano
  • 8,378
  • 1
  • 30
  • 52
  • 1
    Thanks for the suggestion. I tried this, but unfortunately it runs out of memory when it initially tries to instantiate the dictionary: Dictionary count = new Dictionary(b.Width*b.Height); – Rafe Dec 17 '13 at 19:54
  • +1. Good research, thanks for sharing. Even though I would advocate for using a different methodology. Otherwise you risk of hitting the memory bounds very soon again, with no available solution. Based on previous experience with Project Euler tasks, an efficient solution is a dozen of orders of magnitude faster and uses some 1MB-10MB of memory. Remember, old computers did not have as much memory or CPU power, and people still managed somehow. – Victor Zakharov Dec 17 '13 at 19:55
  • 1
    @Rafe only with 6 million entries? That is very odd, because that doesn't take much memory. What system are you running on? In any case, a solution with arrays seems better. – Julián Urbano Dec 17 '13 at 19:56
  • 1
    @ScottChamberlain Because that's the max number possible different colors in the image. In this case that value turns out to be 16777216. – Rafe Dec 17 '13 at 20:06
  • Even if each entry used 16 bytes... (let's say two ints and two random pointers used for god knows what) -- at 24 million entries the object would only be 384 MB... so, you go to expand, that allocates another 768 MB (even together, that's only 1152 MB -- why would that run out of memory? -- And .NET allows up to 2 GB per object -- and modern Windows versions can map memory so it doesn't need to be contiguous... – BrainSlugs83 Dec 17 '13 at 20:58
  • @Rafe you'll tend to have a lot of repeated colors. I suspect the six million failure point is almost large enough. Try starting at just 8 million. – Joel Coehoorn Dec 17 '13 at 21:21
  • 1
    @Rafe suggestion: why not resize down the image to something more manageable? – Julián Urbano Dec 18 '13 at 00:27
  • @JuliánUrbano I think the resizing idea is the most practical solution so far. I don't know why I didn't think of it. You people are so smart! – Rafe Dec 18 '13 at 15:06
  • @Rafe just happy ideas :-) Glad to help! – Julián Urbano Dec 18 '13 at 15:15
  • Absolutely fantastic! Thank you! – ProxyTech Jul 18 '14 at 07:36
11

Each dictionary entry holds two 4-byte integers: 8 bytes total. 8 bytes * 6 millions entries is only about 48MB, +/- some space for object overhead, alignment, etc. There's plenty of space in memory for this. .Net provides virtual address space of up to 2 GB per process. 48MB or so shouldn't cause a problem.

I expect what's actually happening here is related to how the dictionary auto-expands and how the garbage collector handles (or doesn't handle) compaction.

First, the auto-expanding part. Last time I checked (back around .Net 2.0*), collections in .Net tended to use arrays internally. They would allocated a reasonably-sized array in the collection constructor (say, 10 items), and then use a doubling algorithm to create additional space whenever the array filled up. All the existing items would have to be copied to the new array, but then the old array could be garbage collected. The garbage collector is pretty reliable about this, and so it means you're left using space for at most 2n - 1 items in the collection.

Now the Garbage Collector compaction part. After a certain size, these arrays end up in a section of memory called the Large Object Heap. Garbage Collection still works here (though less often). What doesn't really work here well is compaction (think memory defragmentation). The physical memory used by the old object will be released, returned to the operating system, and available for other processes. However, the virtual address space in your process... the table that maps program memory offsets to physical memory addresses, will still have the (empty) space reserved.

This is important, because remember: we're working with a rapidly growing object. It's possible for such an object to take up address space far larger than the final size of the object itself. An object grows enough, fast enough, and suddenly you get an OutOfMemoryException, even though your app isn't really using all that much RAM.

The first solution here is allocate enough space in the initial collection for all of your data. This allows you to skip all those re-allocations and copying. Your data will live in a single array, and use only the space you actually asked for. Most collections, including the Dictionary, have an overload for the constructor that allows you to give it the number of items you want the first array to use. Be careful here: you don't need to allocate an item for every pixel in your image. There will be a lot of repeated colors. You only need to allocate enough to have space for each color in your image. If it's only large images that give you problems, and you're almost handling them with six millions records, you might find that 8 million is plenty.

My next suggestion is to group your pixel colors. A human can't tell and doesn't care if two colors are just one bit apart in any of the rgb components. You might go as far as to look at the separate RGB values for each pixel and normalize the pixel so that you only care about changes of more than 5 or so for an R,G,or B value. That would get you from 16.5 million potential colors all the way down to only about 132,000, and the data will likely be more useful, too. That might look something like this:

var colorCounts = new Dictionary<Color, int>(132651);
foreach(Color c in GetImagePixels().Select( c=> Color.FromArgb( (c.R/5) * 5, (c.G/5) * 5, (c.B/5) * 5) )
{
    colorCounts[c] += 1;
}

* IIRC, somewhere in a recent or upcoming version of .Net both of these issues are being addressed. One by allowing you to force compaction of the LOH, and the other by using a set of arrays for collection backing stores, rather than trying to keep everything in one big array

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • 1
    To address your `*` [.NET 4.5.1 added the feature](http://msdn.microsoft.com/en-us/library/system.runtime.gcsettings.largeobjectheapcompactionmode%28v=vs.110%29.aspx) to compact on the next blocking generation 2 garbage collection – Scott Chamberlain Dec 17 '13 at 21:24
  • 1
    Good info. The doubling algorithm, as best as I can determine, selects the next prime number that's at least double the current value. So the allocated array could be `(2n + x)`, where `x` is some relatively small number. Also, pre-sizing the dictionary is almost certainly the best solution. One note: `HashSet` doesn't have a constructor that lets you set the initial size. You can, however, get around that by populating a `List` with unique items, calling `new HashSet(list)`, and then clearing the `HashSet`. See http://www.informit.com/guides/content.aspx?g=dotnet&seqNum=666 – Jim Mischel Dec 17 '13 at 21:28
  • 1
    Don't think this is the issue here. The OP commented that he couldn't even instantiate the dictionary allocating all space necessary, just had the OOM exception there, so no expansion involved yet http://stackoverflow.com/questions/20643222/net-dictionaryint-int-out-of-memory-exception-at-around-6-000-000-entries/#comment30902049_20643387 – Julián Urbano Dec 18 '13 at 00:26
5

The maximum size limit provided by CLR is 2GB

When you run a 64-bit managed application on a 64-bit Windows operating system, you can create an object of no more than 2 gigabytes (GB).

You may better use an array.

You may also check this BigArray<T>, getting around the 2GB array size limit

Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
  • 1
    6000000 ints should only use about 24mb of space, though - nowhere near the 2gb limit... – Reed Copsey Dec 17 '13 at 19:46
  • 2 GB is the entire memory space of your program, all objects. – Joe Dec 17 '13 at 19:46
  • 2
    I think int[] is the way to go. – peter.petrov Dec 17 '13 at 19:48
  • @ReedCopsey:- Yes I agree to that and I just thought that using an array would be a better idea. Do correct me if I am missing something – Rahul Tripathi Dec 17 '13 at 19:49
  • 2
    @ReedCopsey: What about 4 billion ints (number of colors in a 32-bit palette? :) – Victor Zakharov Dec 17 '13 at 19:50
  • 1
    @ReedCopsey: `Dictionary` creates arrays of the `Entry` struct. This struct is 8 bytes + sizeof(TKey) + sizeof(TValue). This is 16 bytes in this case. So the biggest array that Dictionary will reference is 16 * 6M = 96 MB. – Steven Dec 17 '13 at 19:51
  • @Steven Yes - but still shouldn't be enough to be causing the problem directly. It creates multiple arrays like that, but I do kind of suspect something else is going on here... – Reed Copsey Dec 17 '13 at 19:51
  • @Steven In addition, the Dictionary doesn't use a single array, so it's multiple arrays which should add up to near 100mb of usage, but no single array requiring a 100mb segment, which would actually help prevent OOM exceptions in general... – Reed Copsey Dec 17 '13 at 19:52
  • @ReedCopsey: I think you are right. There's something else going on. – Steven Dec 17 '13 at 19:52
  • Another issue is OOM Execptions do not mean there is not enough space, it means there is not enough ***contiguous*** space. Those many small arrays may be hurting instead of helping as you may be re-allocating frequently leading to memory fragmentation which can lead to OOM on a relitivly small object because you nolonger have any large open gaps in your memory until the GC decides to compact the large object heap. – Scott Chamberlain Dec 17 '13 at 19:56
  • @ReedCopsey: True. There are two arrays, the buckets and the entries. Multiple small arrays is less likely cause problems. – Steven Dec 17 '13 at 19:57
5

In the 32 bit runtime, the maximum number of items you can have in a Dictionary<int, int> is in the neighborhood of 61.7 million. See my old article for more info.

If you're running in 32 bit mode, then your entire application plus whatever bits of ASP.NET and the underlying machinery is required all have to fit within the memory available to your process: normally 2 GB in the 32-bit runtime.

By the way, a really wacky way to solve your problem (but one I wouldn't recommend unless you're really hurting for memory), would be the following (assuming a 24-bit image):

  1. Call LockBits to get a pointer to the raw image data
  2. Compress the per-scan-line padding by moving the data for each scan line to fill the previous row's padding. You end up with an array of 3-byte values followed by a bunch of empty space (to equal the padding).
  3. Sort the image data. That is, sort the 3-byte values. You'd have to write a custom sort, but it wouldn't be too bad.
  4. Go sequentially through the array and count the number of unique values.
  5. Allocate a 2-dimensional array: int[count,2] to hold the values and their occurrence counts.
  6. Go sequentially through the array again to count occurrences of each unique value and populate the counts array.

I wouldn't honestly suggest using this method. Just got a little laugh when I thought of it.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
2

Try using an array instead. I doubt it will run out of memory. 6 million int array elements is not a big deal.

peter.petrov
  • 38,363
  • 16
  • 94
  • 159
  • However with 6M values I think there will be a huge performance gap vs a dictionary. – Pierre-Luc Pineault Dec 17 '13 at 19:46
  • 1
    @Pierre-LucPineault How come? Access to an array element is O(1). Theoretically it is the same with Dictionary. But it will be better with an array in fact. – peter.petrov Dec 17 '13 at 19:47
  • 1
    Yeah, as long as you're aware of the maximum number of possible colors, there's a very specific limit. The disadvantage is that you'd need to allocate a large chunk of memory even when someone is analyzing a solid yellow square. – Katana314 Dec 17 '13 at 19:49
  • @Katana314 Yes, that is indeed so. – peter.petrov Dec 17 '13 at 19:50
  • The problem with using an array is that he doesn't know what values will go into it, or how many values there are. He would have to maintain a sorted array, which means O(log n) lookup and O(n) insert. That's going to be pretty expensive. – Jim Mischel Dec 17 '13 at 21:03