4

I have a New Dictionary(Of String, Long()) with 3,125,000 unique (string) keys. I am distributing close to 1 billion (935,984,413) values (all longs) amongst the keys, and populate a long() array for each key.

This works fine and very fast for average datasets, let's say for 1,5000,000 string Keys and 500,000,000 Long values to be distributed, this is done in about 2 hours.

However, for the abovementioned dataset, once I get halfway through my data, the process is running extremely slow and at the current trend may never finish ...

I think I am running out of memory, the application is using 5GB of memory, and I believe it is now limited by my system (8GB of RAM).

How can I calculate the amount of memory I need for the above situation? The size of the string Keys average around 5 characters.

Thanks!

Community
  • 1
  • 1
Yeahson
  • 139
  • 6

1 Answers1

4

Long data type is 8-byte each. For string, it is more complicated. Check out this post by famous Jon Skeet.

Quote:

In the current implementation at least, strings take up 20+(n/2)*4 bytes (rounding the value of n/2 down)

(Note: in his blog post, he has some updates on this string calculation)

Given your case, each of your 5 chars string would take around:

20 + (5/2) * 4 = 20 + 8 = 28 bytes

Nevertheless, you could simplify your calculation by computing only the significant figure - in your case is the Long since it has a lot more members than the string while your string key is of small size (5 chars).

Thus if you have 1 billions of Long, you would have around 8GB memory used only for the Long. Some other overheads + the string would be less significant, but at least almost 8 GB (935,984,413 x 8 = 7,487,875,304) would be needed.

The string, in your example, would be:

28 * 3,125,000 = 87.5 MB

Thus totaling in 7.5~7.6 GB just for the string and the Long()

Community
  • 1
  • 1
Ian
  • 30,182
  • 19
  • 69
  • 107
  • Thanks Ian, I was hoping the calculation would be this (somehow) simple. The previous run on 500,000,000 longs would have taken around 4GB which is correct, it peaked at around 3.9GB for as far I could check. – Yeahson Mar 24 '16 at 09:34
  • @Yeahson you mean the calculation is above is simple? or is it complex? – Ian Mar 24 '16 at 09:35
  • No, it is simple. I was expecting a much more complex approach, and it not being this straightforward. – Yeahson Mar 24 '16 at 09:37
  • @Yeahson ah I see.. it is because the dominant items here are the `Long`... ;) we do not count the `Dictionary` itself for instance, because it is not as significant... – Ian Mar 24 '16 at 09:38
  • Now I need to find a method to minimize the memory footprint. The 1 billion size dataset is not even the maximum for what I am planning to do. – Yeahson Mar 24 '16 at 09:38
  • @Yeahson it could be a lot more complex than that it your object is more complex – Ian Mar 24 '16 at 09:38
  • You are quoting the formula for calculating string sizes on x86, and then conclude that you'll need nearly 8 gig of memory (which you can't possibly have with x86). Why not stick with x64? Am I missing something? – Kirill Shlenskiy Sep 21 '17 at 05:31
  • @KirillShlenskiy Since there isn't explicit info given whether it is x86 or x64, I gave calculation for x86 based on my assumption since x86 would need less memory.. the point is to show that even with x86, the amount of memory needed is huge... on side note, I think with PAE, x86 can go up to 64 GB... (I might be wrong, though) – Ian Sep 21 '17 at 05:52
  • @Ian, fair enough. Thanks for taking the time to explain. – Kirill Shlenskiy Sep 21 '17 at 06:33