2

I am currently having a problem with declaring or filling a large array with data because I get a dialog box saying "Out of memory", originating from CMemoryException.

I am trying to create an array or vector (tried both) with around 50000 elements of an object, where sizeof(MyObjectClass) returns around 37000 bytes.

If I try to just fill up a vector or a CArray element by element, then I get around to filling with somewhere near 16000 elements before getting the Out Of Memory exception. That should be close to 600MBs?

I have 8GB RAM on the machine and only 4GB are being used according to Windows Task Manager. So the amount of physical RAM should not impose a problem. I am running C++ MFC in Visual Studio 2010, 32-bit.

Also if I try to write

MyObjectClass* heaparray = new MyObjectClass[50000];

then I immediately get that very same Out of memory error, on that very row.

Any ideas? Thank You in advance!

UPDATE: I have also tried to simply create a TestStruct with the fields:

struct TestStruct
{
  long long field1;
  GUID field2;
  GUID field3;
  GUID field4;
  TCHAR field5[256];
  TCHAR field6[4];
  TCHAR field7[258];
  TCHAR field8[1026];
  TCHAR field9[258];
  TCHAR field10[16386];
  TCHAR field11[258];
};

TestStruct* heapArr = new TestStruct[50000];

Still the same...I get a "Out of Memory" exception when executing the last line of code. Isn't one of the great things with the heap supposed to be possibility to be limited only by RAM (more or less) when handling big data. And yet...since it crashes already at 600MB of allocated space I cannot agree that that is very big data either...or should I? :/

10100111001
  • 735
  • 15
  • 31
  • What does MyObjectClass look like? – paulm May 11 '14 at 01:00
  • It's unlikely you will be able to allocate more than 4GB on a 32-bit compiler – M.M May 11 '14 at 01:01
  • If its really 37000 bytes per obj then you'll need at least 1764.16 MB – paulm May 11 '14 at 01:02
  • Also you‘re requesting a large amount of contiguous memory, which can be not available even if you have a big amount of physical memory – Banex May 11 '14 at 01:03
  • Thanks for teh fast comments! @paulm : MyObjectClass is used with CCommand> cmd...to fetch lots of data from database. So it has fields mapping to coulumns in database tables... – 10100111001 May 11 '14 at 01:08
  • @MattMcNabb I know of this but I do not need 4GB, I was merely pointing out that I have lots of RAM to take from on the machine... – 10100111001 May 11 '14 at 01:09
  • @paulm 1.8GB yes, but it crashes at around 16000 stored objects in the array if I do not specify a fixed size but instead fill the array element-by-element until I get the out of memory exception. So around 560MB is stored in the array before it crashes. – 10100111001 May 11 '14 at 01:12
  • @Banex Yes, I have also thought of contiguous memory, but how can I verify (easily) that this in fact is the problem and what can I do about it (easily without having to split into pieces)? Can this be managed internally and automatically somehow so that I don't have to fiddle around with small array chunks? – 10100111001 May 11 '14 at 01:15
  • @user2506124 :: as I said in my answer, to verify that it **is** a problem of contiguous memory, use a `std::list` instead of the array. – Massa May 11 '14 at 02:32
  • 1
    Even if you're using a 64-bit OS and have 8GB of RAM a _32-bit user-mode process_ can only access 2GB of memory by default. See: [this question](http://stackoverflow.com/questions/639540/how-much-memory-can-a-32-bit-process-access-on-a-64-bit-operating-system). Can't you just compile your program as a x64 build? – Blastfurnace May 11 '14 at 02:32
  • @Massa Yes, I have done that and it seems to be the issue, thank you for your answer! – 10100111001 May 11 '14 at 10:24
  • @Blastfurnace I am aware of the restrictions of 32-bit programs and yes an x64 conversion project is running simultaneousely. The thing was that it crashed after just allocating 600MB when using an array. – 10100111001 May 11 '14 at 10:28

2 Answers2

5

This is a fun one. Both Vectors and arrays are stored contiguously in memory as stated here.

You are not only looking for 1850000000 bytes (1.72295 gigabytes) in memory, but one unbroken chunk of memory that big. That will be hard to find. If you switch to a different data structure that does not do contiguous storage (say a linked list) then you may be able to store that much.

Note: that will also make each object just a bit bigger.

What would be best would be to see if there is any way to just buffer the objects; load only the ones you will update and load the others on the fly when you need them. I have my doubts that you are doing cpu operations on more than one at a time. If you do it right (with threading most likely) you won't even suffer any slows from reading/writing them.

More information about what you are working on would be helpful. There may even be a way to just have an array filled with a type identifier, if your object has less than 2,147,483,647 (size of int) variations. You could store an array of integers that the class could be generated from (a toHash and fromHash that would be 50000 * 4 bytes = 195.312 kilobytes), that may work for you too. Again, it depends on what you are working on.

Community
  • 1
  • 1
Cory-G
  • 1,025
  • 14
  • 26
  • Thanks for the reply! I need to fetch a lot of rows from DB (network) and then perform some operation on each, and then flush it all down to a local DB table. So, I am all OK with allocating lots of memory over having to to many round-trips to network DB. So I have a CCommand> cmd. I call cmd.Open() with ans SQL query that fetches lots of data (around 50000 rows) and then loop until cmd.MoveNext() fails. Every loop round I add the current element to a vector or array. After 16000 rows I get out of memory exception. Linked list...perhaps I should try that I just did > – 10100111001 May 11 '14 at 01:41
  • > not believe I could not allocate that amount of RAM on the heap more easily with an array/vector....since it crashes after only 600MB allocated :/ (See also my uopdate section in my first post) – 10100111001 May 11 '14 at 01:42
  • 1
    The continuous storage is only in the sense of *virtual memory*. It requires only continuous range in virtual memory address, not physical storage. – Siyuan Ren May 11 '14 at 02:38
  • Ok, now I've had the time to test with a std:list and yes it does work! Contiguous memory seems to be the cause. Thank you for your answer! – 10100111001 May 11 '14 at 10:16
2

I will try to expand on @user1884803's answer:

  1. Don't use a pointer to an array. Even Visual Studio 2010 has <vector>. But see next point.

  2. Don't use a vector either... Specially if you really want to read all your MyObjectClass objects in RAM. As the other answer said, even if you have 4Gbytes free, you probably don't have 1.7Gbytes of contiguous free memory.

  3. So, if you really, really, want to read all your objects in RAM (because the processing you want to do on them is non-linear, or needs many records at the same time in memory), use a std::list<MyObjectClass> or, if you need a "key" to access each record, use a std::map<KeyType, MyObjectClass>. BUT...

  4. You really should try not reading 1.8Gbytes of objects to RAM. Even if you have that much RAM lying around unused, it's just not a good practice. If you can, read each object from the database, process it, and write it back to the database discarding the used object, not accumulating the whole thing in RAM. If you need and if it improves your speed, you can save part of it in a std::list, std::map, or even in a std::vector, and on demand refresh other parts of the objects from the database.

That way, your program would go from:

if( cmd.Open() ) {
  do {
    MyObjectClass obj = cmd.Read(); // whatever is needed to read the object from the db
    vectorOfObjects.push_back(obj); // or list, or map...
  } while( cmd.MoveNext() );
}

for( std::vector<MyObjectClass>::iterator p = vectorOfObjects.begin(), e = vectorOfObjects.end(); p != e; ++p ) {
  // process *p
}

for( std::vector<MyObjectClass>::iterator p = vectorOfObjects.begin(), e = vectorOfObjects.end(); p != e; ++p ) {
  cmd.Save(*p); // see reading above, but for saving...
}

to something like

if( cmd.Open() ) {
  do {
    MyObjectClass obj = cmd.Read();
    // JUST PROCESS obj here and go to next

    cmd.Save(obj); // or whatever
  } while( cmd.MoveNext() );
}
Massa
  • 8,647
  • 2
  • 25
  • 26
  • Thank you for your reply! Yes, but I see what you mean, but the thing is that in the end all those data needs to be transferred in in some form from one network DB to a local DB table. Doing 50000 round-trips to a network-DB far away by fetching row-by-row and then also saving row-by-row (another 50000 round-trips to local server this time) after processing each one is taking too much time (which is the problem of today). So I don't know wat other handy options I have within this time-frame... – 10100111001 May 11 '14 at 10:22
  • @user2506124 You know you'll have to fetch and save row-by-row anyway, even with `CCommand>` and that `CCommand` will cache your records much more efficiently than you trying to put them in an array... – Massa May 11 '14 at 15:06
  • Yes I know that the fetch will be row-by-row, but I see a 10-time increase in network utilization by calling cmd.MoveNext() than the other current method of fetching, perhaps more data is cached when the sql query for the Command fetched more than one row at a time? Also, when saving back to other DB I get significantly higher performance than currently utilized solution when I create a large SQL string with Insert-statements which are sent to the SQL Server. – 10100111001 May 11 '14 at 22:30
  • I am fairly new to CCommand and that so I might not be able to explain why everything is happening in detail, just observing the effects of different ways to approach the problem as Ive seen them. Thanks for your help! – 10100111001 May 11 '14 at 22:32
  • @user2506124 :: what is the _"current method of fetching"_ ? maybe I can adjust my answer to help you more. – Massa May 12 '14 at 14:34
  • @Messa Aah, current mode of fetching and writing to DB is fetching one row at a time, and writing to other db one row at a time (plus structure checking, index, keys etc...). But it is the data transfer part which takes time when doing it one row at a time (both reading and writing). – 10100111001 May 13 '14 at 10:39
  • @user2506124 I think you didn't understand my question: you said _"I see a 10-time increase in network utilization by calling cmd.MoveNext() than the other current method of fetching"_ and I was curious about that is that _"other current method of fetching"_ ... – Massa May 13 '14 at 11:50
  • @Messa Well, it still uses a CCommand but in a different way, where the SQL query fetches one row at a time. Then every column is identified by DB-type and from command object for network DB the value for each column is individually then fetched and processed if needed and then set to destination DB command object by cmd.setValue(). Finally cmd.Insert() is run to save. Then Next row from source-DB is fetched and this is repeated many times... I belive that by having specified an SQL query which now fetches all rows, then CCommand object can have a sliding window cache and that is >> – 10100111001 May 13 '14 at 14:07
  • @Messa >> perhaps why I see a much higher network utilization. Only when I have all data I can quickly process it all and save to other DB in proper chunks instead of one row at a time. That's my theory of this, but since I am not used to CCommand and all that I might be mistaken. – 10100111001 May 13 '14 at 14:10
  • Just notice you _can_ use it in the same way, without accumulating the whole dataset in memory. But as you reported using `list` solved your problem, I wouldn't fret too much on it. Good luck! HTH! – Massa May 13 '14 at 14:21