Working with "very very" large arrays

Question

I need to work with very large arrays of small types (int or float arrays), i'm targeting X64 only on machines with a lot of ram, physical memory is never the issue in my scenarios. While looking at the doc for gcAllowVeryLargeObjects i noticed this point :

•The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types.

Now my issue is i actually "need" to work with larger arrays than that, what would be the appropriate workaround here? creating arrays of arrays or other abstractions?

Knowing i mostly need to access those arrays sequencially (never random reads, but often diferent segments getting read sequencially by diferent threads, potentially 100+ threads at once) what would my best bet be?

I may need to hold arrays of up to 65 536 000 000 elements or more.

Physical RAM is not an issue for 65.536 millions of elements? You are talking terabytes here... — David Brabant, Aug 18 '14 at 09:59
I'm not talking terabytes but around 256gigabytes, and yes that's not an issue. And i don't use a database because this has nothing to do with database centric work, this is actual math work on the full set at once without any storage needed just input => run => output — Ronan Thibaudau, Aug 18 '14 at 10:00
@Dirk no, there's absolutely no pattern in the values in the array, could be there's not even 1 duplicate value at times in the whole array. — Ronan Thibaudau, Aug 18 '14 at 10:02
Do you need index level access, you mention this will be processed in sequential chunks. I'm wondering if you're better with a stream if a database is out of scope. — kidshaw, Aug 18 '14 at 10:06
I guess the question is: do you really need to store that data in memory? Where does it come from? Is it normally stored somewhere else and you only read it all to process it? I think you'd have to be far more specific if you want a good answer. — Dirk, Aug 18 '14 at 10:06
I think you're looking for a database? ever heard of one of them things? — Brunis, Aug 18 '14 at 10:07
@Dirk the data is generated procedurally in memory and then modified in memory as a sequence of algorithm applied one after the other to the whole dataset, it's not coming from anywhere but the generation algorithm so there's no physical source for it outside of runtime, it's not stored anywhere (only the ouput needs to be stored once all the algorithms are done). Runtime can be long and serialization / deserialization to disk with that amount of data would make it much longer, the solution has to be fully in memory at all times, with ram not being an issue — Ronan Thibaudau, Aug 18 '14 at 10:10
@Brunis i already answered that, no a database is not a fit for what i do at all — Ronan Thibaudau, Aug 18 '14 at 10:11
@kidshaw i don't need index level access as long as i can process chunks (i'm fine with not being able to access a particular index at a given time) but i do need access level information (i need to know the index i'm processing) — Ronan Thibaudau, Aug 18 '14 at 10:12
How about creating an array-of-arrays wrapper class that uses multiple individual arrays of the right size, and calculates the correct index? In other words you could create a system where you have many arrays at max size and intelligently switch between them. You would need a bit over 30 arrays if you maximize their size, or more if you don't create the individual arrays at max size. — Lasse V. Karlsen, Aug 18 '14 at 10:14
@LasseV.Karlsen That would work but i'm just wondering if array of array is the optimal solution or i'm missing something, while i guess working with that large amount of that small data items is uncommon i was hoping someone would have come up with a tested & fast solution — Ronan Thibaudau, Aug 18 '14 at 10:15
Well, do you have to use "arrays"? What about simply allocating 256GB of sequential memory and processing using x86 instructions? — Lasse V. Karlsen, Aug 18 '14 at 10:16
@LasseV.Karlsen I don't have to use arrays no, not sure how to allocate 256gb of sequencial memory in C# without using arrays, or should i go down to unsafe code with pointers and would i not face any similar issues doing that? (if so that sounds like the "right" solution there actually) — Ronan Thibaudau, Aug 18 '14 at 10:17
"The maximum index in any single dimension is 2,147,483,591 (0x7FFFFFC7) for byte arrays and arrays of single-byte structures, and 2,146,435,071 (0X7FEFFFFF) for other types." So the limit is smaller for multiple byte structure as in Jagged arrays? — Paul Zahra, Aug 18 '14 at 10:31
P.S. Objects can have a nasty overhead of allocating, deallocating and referencing... look at structs instead. — Paul Zahra, Aug 18 '14 at 10:34
@PaulZahra not sure what you're talking about, as was said from the start these are arrays of simple types (float or int), can't get any simpler than that. — Ronan Thibaudau, Aug 18 '14 at 10:35
@RonanThibaudau When you allocate an int to an array element, it occupies four bytes, therefore it isn't a "single-byte structure", therefore you will have a significantly lower index limit? — Paul Zahra, Aug 18 '14 at 10:37
@PaulZahra i know, but that's not an issue here, i was refering to your second comment (the one about objects and allocation) in my answer. The diference between since a multi byte structure is irelevant (barely any diference in amount of elements you can store, and i need 1-2 orders of magnitude more) — Ronan Thibaudau, Aug 18 '14 at 10:38
@RonanThibaudau In that case look into the difference between classes and structs... "Whenever you have a need for a type that will be used often and is mostly just a piece of data, structs might be a good option." http://msdn.microsoft.com/en-us/library/aa288471%28v=vs.71%29.aspx — Paul Zahra, Aug 18 '14 at 10:43
@PaulZahra Still don't see what this has to do with anything, int and float aren't classes. Using classes was never mentioned as an option so i'm really confused about where this is coming from — Ronan Thibaudau, Aug 18 '14 at 10:44
@RonanThibaudau Yes, but in C# arrays are objects... i.e. could you replace the array with a custom struct for better suitability / performance? — Paul Zahra, Aug 18 '14 at 10:52
An array is an object, but the array elements will be allocated as a sequential block of memory, and with the case of ints and floats, the block of memory will contain the actual ints or floats, so there shouldn't be "array of objects" in play here, but yes, the array itself is an object, but this should have no bearing on this problem. Array access is already very fast in .NET, not sure you can mimic an array faster than arrays already are. — Lasse V. Karlsen, Aug 18 '14 at 11:12
@LasseV.Karlsen can you post an answer with a code sample of how you'd allocate that? — Ronan Thibaudau, Aug 18 '14 at 11:13
Not sure I know how either without just resorting to unmanaged code. I know how I would make the array-of-array solution, but the allocate-256GB-block is just a theory from my end, not sure it's doable or practical, perhaps someone else know if, or how, to do that. — Lasse V. Karlsen, Aug 18 '14 at 11:15

score 5 · Accepted Answer · answered Aug 18 '14 at 11:23

5

If you really must break the array length limit then you'll have to split the array into chunks of suitable size. You can wrap those chunks together in a container that has the appropriate semantics, like the BigArrayOfLong object that James McCaffrey blogged about a while back. There are numerous others like it.

The basic idea is you use a jagged array to allocate the space you're going to use. Note that a multi-dimensional array won't give you any advantage since it is still a single object, while a jagged array is a smaller array of arrays, each of which is its own object in (probably not contiguous) memory.

Here's a very simple (and not particular optimal) implementation:

public class HugeArray<T> : IEnumerable<T>
    where T : struct
{
    public static int arysize = (Int32.MaxValue >> 4) / Marshal.SizeOf<T>();

    public readonly long Capacity;
    private readonly T[][] content;

    public T this[long index]
    {
        get
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            return content[chunk][offset];
        }
        set
        {
            if (index < 0 || index >= Capacity)
                throw new IndexOutOfRangeException();
            int chunk = (int)(index / arysize);
            int offset = (int)(index % arysize);
            content[chunk][offset] = value;
        }
    }

    public HugeArray(long capacity)
    {
        Capacity = capacity;
        int nChunks = (int)(capacity / arysize);
        int nRemainder = (int)(capacity % arysize);

        if (nRemainder == 0)
            content = new T[nChunks][];
        else
            content = new T[nChunks + 1][];

        for (int i = 0; i < nChunks; i++)
            content[i] = new T[arysize];
        if (nRemainder > 0)
            content[content.Length - 1] = new T[nRemainder];
    }

    public IEnumerator<T> GetEnumerator()
    {
        return content.SelectMany(c => c).GetEnumerator();
    }

    IEnumerator System.Collections.IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

This one is statically allocated, but it's not too hard to make one that grows to fit demand. Just make sure that the block size you specify isn't completely out of range. I've gone with a calculation based on the item size just in case.

answered Aug 18 '14 at 11:23

Corey

15,524
2
35
68

Sounds like a nice wrapper you took time to do around my workaround, unless someone can come up with something better (that allows me to actually allocate all that data at once on a single item without wrapper) i'll accept this as the solution, thanks! – Ronan Thibaudau Aug 18 '14 at 11:26
I would also consider retrieving direct access to the underlying arrays, since this sounds like a special case that warrants a special solution, and not something general and reusable. In other words, something that would give me one of the arrays out of the wrapper, so that I can use normal array access for a portion of the huge array before having to switch to the next array. Does complicate the outside code slightly, but would give a performance boost (should measure this to be certain though) since we can avoid a division and a remainder per element. – Lasse V. Karlsen Aug 18 '14 at 11:29
I probably will go with doing X chunks (with X being arr/maxsize) and hand each one to a thread so using jagged arrays isn't really an issue (i can create a jagged array and handle each X dimension to a thread that will handle the whole of the Y dimension). Shouldn't be a performance issue since i'm not dereferencing once per index but once per thread so once per 2+bil elements – Ronan Thibaudau Aug 18 '14 at 11:31
1

@LasseV.Karlsen Wouldn't be difficult to achieve. Would also help indexer performance if the `arysize` were a power of 2 with a corresponding mask. Bit shifts and masks are _much_ faster than arithmetic... but this was just an example after all :P – Corey Aug 18 '14 at 11:34
This is a great solution, but it is still not enough for my needs. There is a way to scale up this solution? – Omri Jan 17 '18 at 19:04
1

@Omri Not sure how much bigger you can go than this. Each slot in the content array is ~1/4GB of memory - you could maybe make that 1GB per slot if you're brave, giving you 4 times the potential maximum space. But this should already be capable of containing many terrabytes of memory. What's your use case? – Corey Jan 17 '18 at 22:48
@Corey, I'm trying to define a 20M by 20M 2 dimensional array of ints. every method that I've tried failed except using files but it is not practical (slow). I've tried HugeArray> but eventually, i get out of mem exception... – Omri Jan 18 '18 at 08:12
1

@Omri you are trying to allocate ~1.5 exabytes worth of memory, and wondering why this isn't working for you? Do you *have* 1.5 exabytes worth of RAM? I'm guessing not. Do you have an OS that can handle 1.5 exabytes of RAM? Also unlikely. Maybe you need to revisit your problem and find an alternative path. – Corey Jan 19 '18 at 04:52
Very usefull processing size larger than 50,000*50,000 pixels image or very large pointcloud data.Since in year 2023, Microsoft allow the memory size of an array larger than 2GB, it's implementation of index is still Uint32. It's funny people a decade ago thought such array size is enough, just as those 2 decades thought capacity of 32bit memory address is more than enough. – g11de Aug 23 '23 at 09:08
@g11de If you're dealing with that much data it can be restrictive, yeah. There are other ways around it like memory-mapped files and stream processing, but processing extremely large images will always be *much* faster when you can load it all into memory. We're approaching the point where we're routinely doing stuff that needs much larger arrays, and the limits are starting to chafe. – Corey Aug 23 '23 at 23:16

score 0 · Answer 2 · answered Aug 18 '14 at 10:10

0

You could just avoid using real arrays and simulate them via a stream.

If you want it seekable (which you do), you're limited to long (2^64 / 2 (signed) bits) Then you simply seek to index * n bytes and read n bytes.

If you use int32 or double (n=4) you have space for 2,8e+17 positions.

answered Aug 18 '14 at 10:10

WhoIsJohnDoe

338
1
8

How would that work in memory? Using a memorystream? Wouldn't that use an array internally & be faced with the same limitation? – Ronan Thibaudau Aug 18 '14 at 10:12
Arrays (as sadly do other objects) have a hard limit (both on 32bit and 64bit machines) of 2GB. – Paul Zahra Aug 18 '14 at 10:22
@PaulZahra there is no 2gb limit in recent .net version with gcAllowVeryLargeObjects, this is not the issue here, the issue is there still is a limit to the number of elements in them – Ronan Thibaudau Aug 18 '14 at 10:25
Ah yeah doh!, in .Net 4.5 the 2GB restriction is lifted. – Paul Zahra Aug 18 '14 at 10:27
To be honest, I'd use a FileStream but you're right, MemoryStream probably would not work. – WhoIsJohnDoe Aug 18 '14 at 10:27
Filestream wouldn't work for me, disk access is way to slow and i don't need to read the data just once, there can be dozens of functions each updating the data each time, this would means writing and reading terabytes of data from the disk, maybe 10s of terabytes, we're talking probably 3-4 orders of magnitude slower than doing this in ram at least – Ronan Thibaudau Aug 18 '14 at 10:29

score 0 · Answer 3 · answered Aug 18 '14 at 10:12

0

This sounds like a problem for distributed computing, something like Google Map Reduce.

When it gets too big for your current infrastructure, scale it to more boxes.

answered Aug 18 '14 at 10:12

Jeff Watkins

6,343
16
19

This isn't something i want to distribute, it has to run locally on a single (very large) computer, i don't want it node based, and my question was really "is there something cleaner than nesting arrays level", i'm not looking for alternative architechtures, i do want a multi 100 gig & potentially tera data set in ram at once on a single computer – Ronan Thibaudau Aug 18 '14 at 10:14
Read the question, it's not too big for my infrastructure at all, i'm facing a language limitation, nothing to do with the hardware – Ronan Thibaudau Aug 18 '14 at 10:14
If you're using sequential access, you could implement your own version of a LinkedList, maybe with bookmarks/chapter stops if you've got significant areas you need to read from. The problem with that is you have large overheads for pointers. – Jeff Watkins Aug 18 '14 at 10:17
Linkedlist wouldn't help either, i don't necessarily want to traverse the whole list at once but big chunks, traversing 40billion elements just to read 40 from 41 billion doesn't sounds like it would scale at all – Ronan Thibaudau Aug 18 '14 at 10:20

score 0 · Answer 4 · answered Aug 18 '14 at 10:16

0

Well I am pretty sure you can't have an array with the size of 6500000000 because usaly it is bigger then the computer memory (no operation system would give a software that much memory.) And probably for another reason. If for some reason you believe you can get that much ram but you think array is to small, you can try working with object that are based on linked list (like stack or even linked list itself). Linked list is not limited by number of indexes (if it is in the range of your ram)

answered Aug 18 '14 at 10:16

Amit Klein

124
1
7

I do have access to computers with 1TB+ ram, linked list would mean a lot of wasted data / cpu power since my scenario isn't mostly to go from start to finish but to process chunks that may not be at the start at all – Ronan Thibaudau Aug 18 '14 at 10:21

score 0 · Answer 5 · answered Aug 18 '14 at 10:31

I'm writing this as a solution but hoping someone will have better to offer that i can mark as the accepted answer.

A solution since the limit is on a dimention of an array would be to use multidimensional arrays and simply index in the multidimentional array by calculating positions as if it were a 1D array

//pseudocode
var index = some large number;
var index1 = index/sizeofarrays;
var index2 = index%sizeofarrays;
var data = myverylargemultidimentionalarray[index1,index2];

I don't know if that would work with boolean for example, because boolean index 1 would be true and false. — bandybabboon, Jan 09 '17 at 11:31

score 0 · Answer 6 · answered Aug 18 '14 at 10:33

0

My suggestion is to use native code (i.e. C++ x64) as C# is just not good to loop through such amount of elements. Think well what information you need to extract out of that data before attempting to load such amount of data to the RAM.

answered Aug 18 '14 at 10:33

Darien Pardinas

5,910
1
41
48

That's pretty random, you'd have to explain why you think it's "just not good to loop through such amount of elements" – Ronan Thibaudau Aug 18 '14 at 10:34
It means that if you are using arrays of several GBs of data, C# is very slow to process such arrays, e.g. calculate the average of your array could take minutes or hours! What sort of processing you are doing? Whatever it is if you are dealing with arrays of more than 1GB C# is not the language for that. – Darien Pardinas Aug 18 '14 at 10:38
That does seem random, as far as i know C# is no slower than C++, not starting a flame war here but no, i don't feel like switching language makes sense, for the arrays i could allocate (2 billion elements), C# performance was just fine. And regardless of the language i'm talking multi weeks workloads here. You have a very skewed view of C# if you think 1gb is out of it's reach, i work at realtime performance with larger data than that daily. – Ronan Thibaudau Aug 18 '14 at 10:41
1

You can do some magic with the garbage collector as ILNumerics.net do to get some performance out of the garbage collector, but please don't compare the speed of native code with interpreted languages otherwise all C++ developers would have lost their jobs by now. I do both C# and C++ so it won't be my case :) – Darien Pardinas Aug 18 '14 at 10:45
As i said please don't start a flame war here, besides you're not answering the question at all (my question was never about performance to begin with, but about a cleaner way to allocate arrays outside of the language size limit without ressorting to nested arrays). For your statement no one will lose their job because there is a lot of legacy code and non .net use cases but there have been equally skilled C++ & C# programmer working separately on tests with C# coming on top (not talking syntetic 1 liner tests but actual working functionally usable tests). Both have different performance – Ronan Thibaudau Aug 18 '14 at 10:53
PROFILES but neither can really be considered "faster" anymore (and maybe C# will finally be if they decide to get vectorisation done & shipped one day) – Ronan Thibaudau Aug 18 '14 at 10:54
Ronan, by no means I want to start a C# vs. C++ war. I work with both languages on daily basics and they are both great and complement each other. Best of luck in your implementation. – Darien Pardinas Aug 18 '14 at 11:36

kidshaw · Answer 7 · 2014-08-18T11:42:29.787

0

Sounds like you should be using a stream to me. Memory stream should be fine as long as your dispose of chunks after you've read them.

My guess is that whatever populates your array is running way faster than what consumes it? If that is the case, you could use the stream to simply be a buffer. When the buffer reaches a critical mass, block new entries, whilst you clear the back log. Sounds like you've got enough memory for that to not be an issue.

Your buffer contents can be handed off in chunks to the parallel library with an index being maintained to give you the current index.

Pseudo-code being:

receive new item and add to mega-memory stream (Memory will copy over to pagefile here so RAM even less of an issue if you've also got crazy amounts of disk!)

TASK THREAD (replicated for each Algorithm) :

while buffer has items
read object from buffer
process object

If you want to leverage parallel processing within each task, stream a block of objects first and pass them to your method as a collection alongside a starting index so you can still deduce the current item index.

edited Aug 18 '14 at 11:42

answered Aug 18 '14 at 10:53

kidshaw

3,423
2
16
28

No i actually need all the data at once, as it is a multi step process (generate once, run algo one, run algo two ... run algo 70 etc) so i can't dispose of anything, as i mentioned ram is not a concern, i WANT all the data in ram at once, i'm perfectly fine with using 256GB 512GB 1TB or even (but i need an upgrade then and that's costly) 2TB – Ronan Thibaudau Aug 18 '14 at 11:00
I guess that makes it easier. You don't need to clear the 'buffer'. As long as you don't edit the contents of the stream after it has been loaded, you should be OK. I've edited my answer. – kidshaw Aug 18 '14 at 11:43
But memorystream is likely backed by arrays, so i'll hit the same software limitation – Ronan Thibaudau Aug 18 '14 at 12:30
Also it's purely sequencial (there's no generator generating individual items that have to be processed as they are made available, the generation as a whole is 1 step, then each pass is 1 step, it's not possible to generate a bit, consume a bit) – Ronan Thibaudau Aug 18 '14 at 12:31
Yes - I updated the answer already to reflect that data was supplie first and then you begin processing. I don't think the memory stream does rely on integer arrays internally - I haven't proven the theory but suspect it is more akin to a linked list so grows dynamically but is essentially pointers in memory to more memory. – kidshaw Aug 18 '14 at 12:33
Doesn't look like it would work anyway, no way to write anything to it except byte and byte[], converting from float or int to that on a per read/Write basis would be prohibitely expensive i think – Ronan Thibaudau Aug 18 '14 at 12:37
Probably not too bad, especially just for string - but anything more complex will have to first go through BinaryFormatter which will add header bits to each object. We are talking micro-optimisations now so the only way to be sure would be to implement and compare. Interesting problem - good luck with it :) – kidshaw Aug 18 '14 at 12:40
it's not for strings, it's for floats, but seeing how small the calculation are per steps and per items, the iteration is currently the bottleneck in the lower scale implementation, converting from a float to 4 bytes and back doesn't sound workable at all and very hard to maintain :( – Ronan Thibaudau Aug 18 '14 at 12:43
I can see where you're coming from - http://msdn.microsoft.com/en-us/library/yhwsaf3w(v=vs.110).aspx shows how it would work in case you get bored and want to give it a try. – kidshaw Aug 18 '14 at 12:45

Working with "very very" large arrays

7 Answers7

Linked