2

I've written a 32bit program using a dynamic array to store a list of triangles with an unknown count. My current strategy is to estimate a very large number of triangles and then trim the list when all the triangles are created. In some cases I'll only allocate memory once in others I'll need to add to the allocation.

With a very large data set I'm running out of memory when my application is memory usage is about 1.2GB and since the allocation step is so large I feel like I may be fragmenting memory.

Looking at FastMM (memory manager) I see these constants which would suggest one of these as a good size to increment by.

ChunkSize = 64 * 1024;
MaximumSmallBlockSize = 32752;
LargeBlockGranularity = 64 * 1024;

Would one of these be an optimal size for increasing the size of an array?

Eventually this program will become 64bit but we're not quite ready for that step.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
Mitch
  • 335
  • 1
  • 3
  • 10
  • Forgot to add that allocating smaller blocks may be more efficient but will hurt the performance. – Mitch Feb 16 '15 at 17:18
  • http://stackoverflow.com/questions/4933730/working-with-large-arrays-outofram – David Heffernan Feb 16 '15 at 17:34
  • The *optimal* memory allocation is to allocate exactly how much you need once. That requires you to know the right size in advance. Is there really no way to know that? – Rob Kennedy Feb 16 '15 at 17:44
  • @RobKennedy That's only true if you neglect the issue of address space. In a 32 bit process you run out of address space before you run out of memory, when allocating large objects. The solution is to allocate smaller objects, but many of them. – David Heffernan Feb 16 '15 at 17:45

2 Answers2

6

Your real problem here is not that you are running out of memory, but that the memory allocator cannot find a large enough block of contiguous address space. Some simple things you can do to help include:

  1. Execute the code in a 64 bit process.
  2. Add the LARGEADDRESSAWARE PE flag so that your process gets a 4GB address space rather than 2GB.

Beyond that the best you can do is allocate smaller blocks so that you avoid the requirement to store your large data structure in contiguous memory. Allocate memory in blocks. So, if you need 1GB of memory, allocate 64 blocks of size 16MB, for instance. The exact block size that you use can be tuned to your needs. Larger blocks result in better allocation performance, but smaller blocks allow you to use more address space.

Wrap this up in a container that presents an array like interface to the consumer, but internally stores the memory in non-contiguous blocks.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • My tests with a smaller allocation are allowing me to address more memory but I'm still falling short. I forgot about the LARGEADDRESSWAWARE flag and will add that but are there any tradeoffs using this flag? – Mitch Feb 16 '15 at 18:06
  • You might run up against bugs in library code for addresses >2GB. For instance GetCursorPos on Vista can't handle such addresses. To put this bluntly, all your viable options can be found in my answer, assuming you want to avoid the disk. That said, a memory mapped file can help since you get access to the 64 bit disk cache. You should stop skirting the issue and make a 64 bit app. – David Heffernan Feb 16 '15 at 18:13
  • You're right about switching to 64 bit but this is one part of a very large program and it will take more than just recompiling to get it right. A more efficient storage structure that store indices to shared vertices or segments would help but would take longer to build. Thanks for the help. – Mitch Feb 16 '15 at 18:39
4

As far as I know, dynamic arrays in Delphi use contiguous address space (at least in the virtual memory address space.)

Since you are running out of memory at 1.2 gb, I guess that's the point where the memory manager can't find a block contiguous memory large enough to fit a larger array.

One way you can work around this limitation would be to implement your array as a collection of smaller array of (lets say) 200 mb in size. That should give you some more headroom before you hit the memory cap.

From the 1.2 gb value, I would guess your program isn't compiled to be "large address aware". You can see here how to compile your application like this.

One last trick would be to actually save the array data in a file. I use this trick for one of my application where I needed to load a few GB of images to be displayed in a grid. What I did was to create a file with the attribute FILE_ATTRIBUTE_TEMPORARY and FILE_FLAG_DELETE_ON_CLOSE and saved/loaded images from the resulting file. From CreateFile documentation:

A file is being used for temporary storage. File systems avoid writing data back to mass storage if sufficient cache memory is available, because an application deletes a temporary file after a handle is closed. In that case, the system can entirely avoid writing the data. Otherwise, the data is written after the handle is closed.

Since it makes use of cache memory, I believe it allows an application to use memory beyond the 32 bits limitation since the cache is managed by the OS and (as far as I know) not mapped inside the process' virtual memory space. After doing this change, performance were still pretty good. But I can't say if performances would still be good enough for your needs.

Community
  • 1
  • 1
Ken Bourassa
  • 6,363
  • 1
  • 19
  • 28
  • Storing on disk was considered but the performance would suffer. Only a few customers use data sets so large as to cause problems so it doesn't make sense to go this way. Setting the Large Address Space seems to solve the problem for now. – Mitch Feb 16 '15 at 18:50
  • Using FILE_ATTRIBUTE_TEMPORARY, the data could potentially stays permanently in memory if enough system memory is available, in that case, performance would be barely impacted. It's not very different than using memory actually. It's not because you reserve "memory" that the data will actually reside in the RAM. Though, I admit modern systems usually have more than enough RAM to avoid making use of the SWAP file. – Ken Bourassa Feb 16 '15 at 19:52
  • Thanks, I missed that. Now I'm curious to test the performance of the cache. – Mitch Feb 16 '15 at 21:57
  • Performed a simple test and the cached file was much slower. For 1,000,000 records the file was 3000ms and the array was 60ms. Still a good technique to have up your sleeve. – Mitch Feb 16 '15 at 22:23
  • @Mitch I think if you use a memory mappped file, and you do everything right then the perf should be good. But it's a lot of effort. Just make a 64 bit port. And for 32 bit mitigate using the techniques that Ken and I describe. – David Heffernan Feb 16 '15 at 22:26
  • For the sake of documentation : I did some performance test of my own for the file technique. if I read items from the file 1 by 1, I was effectively close to 10 times slower. If, on the other hand, I read 10000 items at a time from the file, I was only about 35-40% slower. So, for random access, this technique might not be very good (at least for very small structures) but for sequential access, the performance hit isn't that bad. – Ken Bourassa Feb 18 '15 at 14:46