3

I have huge transient arrays created rapidly. Some are kept, some are GC-d. This defragments the heap and the app consumes approx. 2.5x more memory than it would truly need resulting OutOfMemoryException.

As a solution, I would prefer to have one gigantic array (PointF[]) and do the allocation and management of segments by myself. But I wonder how I could make two (or more) arrays share the same memory space.

PointF[] giganticList = new PointF[100];
PointF[] segment = ???; 
// I want the segment length to be 20 and starting e.g at position 50 
// within the gigantic list

I am thinking of a trick like the winner answer of this SO question. Would that be possible? The problem is that the length and the number of the segment arrays are known only in runtime.

Community
  • 1
  • 1
user256890
  • 3,396
  • 5
  • 28
  • 45
  • Do you currently have memory/CPU usage issues? If not, you probably shouldn't care about that. – ken2k Jun 14 '13 at 08:44
  • @ken2k. From the OP question: "_the app consumes approx. 2.5x more memory than it would truly need_ ***resulting OutOfMemoryException***" (sic) – Andy Brown Jun 14 '13 at 08:45
  • @AndyBrown Wow, sorry, didn't even noticed that part :) – ken2k Jun 14 '13 at 08:46
  • There are other ways to combat fragmentation. Depends on the actual algorithms. – H H Jun 14 '13 at 09:01
  • Henk is right, this depends very much on your specific problem. before you get too deep into this, read the link to limits on object size in my answer and make sure you understand the memory limits for large objects. – Andy Brown Jun 14 '13 at 09:18
  • Update: .NET 4.5 64-bit can create huge arrays, and you can "query" for memory before using it, see my recent answer edit for details on `gcAllowVeryLargeObjects` and `MemoryFailPoint`. You should check your `OutOfMemoryException` is not due to single objects growing too large rather than the number of disposed arrays you are using, and solve your issue accordingly. 32-bit processes won't go larger than 3 GB anyway, so if you have more than this you need 64 bit. – Andy Brown Jun 14 '13 at 10:39

2 Answers2

4

Assuming you are sure that your OutOfMemoryException could be avoided, and your approach of having it all in memory isn't the actual issue (the GC is pretty good at stopping this happening if memory is available) ...

  • Here is your first problem. I'm not sure the CLR supports any single object larger than 2 GB.
    • Crucial Edit - gcAllowVeryLargeObjects changes this on 64-bit systems - try this before rolling your own solution.
  • Secondly you are talking about "some are kept some are GC'd". i.e. you want to be able to reallocate elements of your array once you are done with a "child array".
  • Thirdly, I'm assuming that the PointF[] giganticList = new PointF[100]; in your question is meant to be more like PointF[] giganticList = new PointF[1000000];?

Also consider using MemoryFailPoint as this allows you to "demand" memory and check for exceptions instead of crashing with OutOfMemoryException.

EDIT Perhaps most importantly you are now entering a land of trade-offs. If you do this you could start losing the advantages of things like the jitter optimising for loops by doing array bound checks at the start of the loop (for (int i= 0; i < myArray.Length; i++) gets optimised, int length = 5; for (int i= 0; i < length; i++) doesn't). If you have high computation resource code, then this could hurt you. You are also going to have to work far harder to process different child arrays in parallel with each other as well. Creating copies of the child arrays, or sections of them, or even items inside them, is still going to allocate more memory which will be GC'd.

This is possible by wrapping the array, and tracking which sections are used for which child arrays. You are essentially talking about allocating a huge chunk of memory, and then reusing parts of it without putting the onus on the GC. You can take advantage of ArraySegment<T>, but that comes with its own potential issues like exposing the original array to all callers.

This is not going to be simple, but it is possible. Likely as not each time you remove a child array you will want to defragment your master array by shifting other child arrays to close the gaps (or do that when you have run out of contiguous segments).

A simple example would look something like the (untested, don't blame me if your computer leaves home and your cat blows up) pseudocode below. There are two other approaches, I mention those at the end.

public class ArrayCollection {
  List<int> startIndexes = new List<int>();
  List<int> lengths = new List<int>();
  const int 1beeellion = 100;
  PointF[] giganticList = new PointF[1beeellion];    
  public ArraySegment<PointF> this[int childIndex] {
    get {
    // Care with this method, ArraySegment exposes the original array, which callers could then
    //  do bad things to
    return new ArraySegment<String>(giganticList, startIndexes[childIndex], length[childIndex]);
  }}

  // returns the index of the child array
  public int AddChild(int length) {

    // TODO: needs to take account of lists with no entries yet
    int startIndex = startIndexes.Last() + lengths.Last(); 

    // TODO: check that startIndex + length is not more than giganticIndex
    //   If it is then 
    //     find the smallest unused block which is larger than the length requested
    //     or defrag our unused array sections
    //   otherwise throw out of memory

    startIndexes.Add(startIndex); // will need inserts for defrag operations
    lengths.Add(length); // will need inserts for defrag operations
    return startIndexes.Count - 1; // inserts will need to return inserted index
  }      
  public ArraySegment<PointF> GetChildAsSegment(int childIndex) {
    // Care with this method, ArraySegment exposes the original array, which callers could then
    //  do bad things to
    return new ArraySegment<String>(giganticList, startIndexes[childIndex], length[childIndex]);
  }
  public void SetChildValue(int childIndex, int elementIndex, PointF value) {
    // TODO: needs to take account of lists with no entries yet, or invalid childIndex
    // TODO: check and PREVENT buffer overflow (see warning) here and in other methods
    //    e.g. 
    if (elementIndex >= lengths[childIndex]) throw new YouAreAnEvilCallerException();
    int falseZeroIndex = startIndexes[childIndex];
    giganticList[falseZeroIndex + elementIndex];
  }
  public PointF GetChildValue(int childIndex, int elementIndex) {
    // TODO: needs to take account of lists with no entries yet, bad child index, element index
    int falseZeroIndex = startIndexes[childIndex];
    return giganticList[falseZeroIndex + elementIndex];
  }
  public void RemoveChildArray(int childIndex) {
    startIndexes.RemoveAt(childIndex);
    lengths.RemoveAt(childIndex);

    // TODO: possibly record the unused segment in another pair of start, length lists
    // to allow for defraging in AddChildArray
  }
}

Warning The above code effectively introduces buffer overflow vulnerabilities if, for instance, you don't check the requested childIndex against length for the child array in methods like SetChildValue. You must understand this and prevent it before trying to do this in production, especially if combining these approaches with use of unsafe.

Now, this could be extended to return psuedo index public PointF this[int index] methods for child arrays, enumerators for the child arrays etc., but as I say, this is getting complex and you need to decide if it really will solve your problem. Most of your time will be spent on the reuse (first) defrag (second) expand (third) throw OutOfMemory (last) logic.

This approach also has the advantage that you could allocate many 2GB subarrays and use them as a single array, if my comment about the 2GB object limit is correct.

This assumes you don't want to go down the unsafe route and use pointers, but the effect is the same, you would just create a wrapper class to manage child arrays in a fixed block of memory.

Another approach is to use the hashset/dictionary approach. Allocate your entire (massive 2GB array) and break it into chunks (say 100 array elements). A child array will then have multiple chunks allocated to it, and some wasted space in its final chunk. This will have the impact of some wasted space overall (depending on your average "child length vs. chunk length" predictions), but the advantage that you could increase and decrease the size of child arrays, and remove and insert child arrays with less impact on your fragmentation.


Noteworthy References:

Other examples of accessing arrays as a different kind of array or structure. The implementations of these might help you develop your own solution

Array optimisation

Parallel arrays and use of unsafe

Community
  • 1
  • 1
Andy Brown
  • 18,961
  • 3
  • 52
  • 62
  • Added some references that are relevant to performance considerations, and other example of classes that provide behaviour by encapuslating array manipulation – Andy Brown Jun 14 '13 at 10:08
  • Added crucial edit with reference to `gcAllowVeryLargeObjects` – Andy Brown Jun 14 '13 at 10:22
  • Waaaoooo! Impressive answer! Thank you Andy. – user256890 Jun 14 '13 at 10:24
  • @user256890. Be careful with the pseudo code, `SetChildValue` effectively has a [buffer overflow](https://www.owasp.org/index.php/Buffer_overflow) vulnerability as it doesn't check `length` for the child array. If you combine this with unsafe you will have a big problem. This exists elsewhere in the code sample as well. – Andy Brown Jun 14 '13 at 12:30
1

Your best bet here is probably to use multiple ArraySegment<PointF> all on the same PointF[] instance, but at different offsets, and have your calling code take note of the relative .Offset and .Count. Note that you would have to write your own code to allocate the next block, and look for gaps, etc - essentially your own mini-allocator.

You can't treat the segments just as a PointF[] directly.

So:

PointF[] giganticList = new PointF[100];
// I want the segment length to be 20 and starting e.g at position 50 
// within the gigantic list
var segment = new ArraySegment<PointF>(giganticList, 50, 20);

As a side note: another approach might be to use a pointer to the data - either from an unmanaged allocation, or from a managed array that has been pinned (note: you should try to avoid pinning), but : while a PointF* can convey its own offset information, it cannot convey length - so you'd need to always pass both a PointF* and Length. By the time you've done that, you might as well have just used ArraySegment<T>, which has the side-benefit of not needing any unsafe code. Of course, depending on the scenario, treating the huge array as unmanaged memory may (in some scenarios) still be tempting.

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900