0

I have 600 milion record into the my database,and i want load all with this query:

var query=(from p in behzad.test
          select p).ToArray();

can i load that?

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
elnaz irani
  • 277
  • 1
  • 3
  • 11
  • http://stackoverflow.com/questions/1391672/what-is-the-maximum-size-that-an-array-can-hold – Tasos K. Apr 26 '15 at 06:21
  • Improbable, unless you have 32gb of ram and **very** small records (just the `int ID` to be clear) – xanatos Apr 26 '15 at 06:23
  • @Alexei I don't see how the max size of an array is correlated to how much memory will be needed to keep some records in memory – xanatos Apr 26 '15 at 06:25
  • @Alexei my ram size is 20GB and Hdd 2TB – elnaz irani Apr 26 '15 at 06:26
  • @xanatos I assume OP already found answer to "how to find how much memory single C# object is using", so remaining part is "can array hold 600m items" is definitely answered in that question (yes for number of items, yes for small items like bytes and requires 4.5 for larger items). – Alexei Levenkov Apr 26 '15 at 06:31
  • 3
    Why do you believe you need to load them all in memory? – John Saunders Apr 26 '15 at 06:37
  • To be useful, a collection that size will also need to be indexed - e.g. `.ToDictionary(p => p.SomeKey)`, at the cost of more memory. – StuartLC Apr 26 '15 at 06:38
  • @StuartLC binary search on sorted `struct` values may be an option... (Also 600m records in-memory, especially on such small machine is very strange requirement, probably OP just wandering and not planning to use it in practice). – Alexei Levenkov Apr 26 '15 at 06:46

1 Answers1

5

Highly improbable... There are multiple "walls" you'll encounter.

First... Let's say that each record is just an ID... just to store in the best way 600 million of your IDs, it is 600 million * 4 bytes = 2.4gb. But I don't think your objects are so much small. Then there is the reference to that objects... Each referece is 8 bytes at 64 bits...

http://codeblog.jonskeet.uk/2011/04/05/of-memory-and-strings/ here Skeet calculated that the minimum memory used by an object at 64 bits is 24 bytes... so 14.4gb (it doesn't includes a reference to the object, the new object[size] is before the GC.GetTotalMemory(true)). In this space you can put up to two int without making the object bigger (the ID and another int field, for example) (in the same page, search for the table with TwoInt32s).

Then there is the problem that LINQ-to-SQL duplicates the records when they are loaded (one copy goes to its object tracker). You can disable this with https://msdn.microsoft.com/en-us/library/system.data.linq.datacontext.objecttrackingenabled.aspx

Then there was a limit of maximum 2gb of memory for each array... being an array of references of 8 bytes (64 bits), it would be around 260 million records for an array. That was removed with .NET 4.0 at 64 bits: https://msdn.microsoft.com/en-us/library/hh285054.aspx

Then there is another problem with ToArray(): to build the array it goes by doubling, using powers of 2 (4 elements, 8 elements, 16, 32, 64, 128... 1024... 1024*1024 elements, ...) so before building an array of 600 million references (that alone is 4.8 gb, but is included in the 24 bytes/element, so we don't count it), it will build a temporary array of 512 million elements (4 gb), then one of 1024 million elements (8 gb, of which 3.2 useless), and then it will resize the one of 1024 million elements to 600 million.

So just before the end of the ToArray() you will have:

  • 14.4 gb of rows
  • 8.0 gb for a 1024 million elements array of references
  • 5.4 gb for a 600 million elements array of references

So at minimum you'll need 14.4 + 8.0 + 5.4 = 27.8 gb... For 2 ints... And this is a "theoretical minimum". I would increase it by at least 50-100% to make it a "practical minimum". And you are still handling 2 int for each record :-)

Now... Alexei suggested using struct. struct have the advantage that they don't require a reference, and their size is nearly the exact size they use.

You could change the query to:

var query = (from p in behzad.test select p)
               .Select(p => new MyStruct { ID = p.ID, Foo = p.Foo })
               .ToArray();

Now each element is 8 bytes big, so you need

  • 8.0 gb for a 1024 million elements array of references
  • 5.4 gb for a 600 million elements array of references

Total 13.4 gb.

Note that using a reference type like a string inside you element is still a big no no (it would eat too much memory).

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • I think if you can get results as `struct` some more data can be fit in memory... obviously couple extra `int`s is not much more/not really more useful, but can round the answer :) – Alexei Levenkov Apr 26 '15 at 07:02
  • 1
    @AlexeiLevenkov You are forgetting the working of `ToArray()`. If you use `struct`s, you'll create 1 billion `struct`s before reducing them to 600 million `struct`s. But instead of working with an array with the element size of 8 bytes (a reference), you'll working with an array with the element size your element. For the same memory you can probably keep 3x`int` (you can theorically put an additional `short`, but then there is the problem of memory alignment... the struct must be aligned for the `int` it contains, so it can't end with a `short`) – xanatos Apr 26 '15 at 07:09
  • good point... since you need about 2600m items to be allocated total (1600m at the same time) while growing the array you can't fit much more in memory... so about 12bytes (3 int) if using `struct` is the max... 50% more compared to reference type :) with a bit more copy operations. – Alexei Levenkov Apr 26 '15 at 07:21
  • @AlexeiLevenkov You were right... Skeet's calculation didn't include the reference to the object... so it is 24 bytes + 8 bytes for a reference. `struct` are a winner :-) – xanatos Apr 26 '15 at 07:52