5

I have been working on trying to figure out why my program is consuming so much system RAM. I'm loading a file from disk into a vector of structs of several dynamically allocated arrays. A 16MB file ends up consuming 280MB of system RAM according to task manager. The types in the file are mostly chars with some shorts and a few longs. There are 331,000 records in the file containing on average about 5 fields. I converted the vector to a struct and that reduced the memory to about 255MB but that still seems very high. With the vector taking up so much memory the program is running out of memory so I need to find a way to get the memory usage more reasonable.

I wrote a simple program to just stuff a vector (or array) with 1,000,000 char pointers. I would expect it to allocate 4+1 bytes for each giving 5MB of memory required for storage, but in fact it is using 64MB (array version) or 67MB (vector version). When the program first starts up it only consumes 400K so why is there an additional 59MB for array or 62MB for vectors being allocated? This extra memory seems to be for each container, so if I create a size_check2 and copy everything and run it the program uses up 135MB for 10MB worth of pointers and data.

Thanks in advance,

size_check.h

#pragma once

#include <vector>

class size_check
{
public:
    size_check(void);
    ~size_check(void);

    typedef unsigned long   size_type;

    void stuff_me( unsigned int howMany );

private:
    size_type**         package;
//  std::vector<size_type*> package;
    size_type*          me;
};

size_check.cpp

#include "size_check.h"

size_check::size_check(void)
{
}

size_check::~size_check(void)
{
}

void size_check::stuff_me( unsigned int howMany )
{
    package = new size_type*[howMany];
    for( unsigned int i = 0; i < howMany; ++i )
    {

        size_type *me = new size_type;
        *me = 33;
        package[i] = me;
//      package.push_back( me );
    }
}

main.cpp #include "size_check.h"

int main( int argc, char * argv[ ] )
{
    const unsigned int buckets = 20;
    const unsigned int size = 50000;

    size_check* me[buckets];

    for( unsigned int i = 0; i < buckets; ++i )
    {
        me[i] = new size_check();
        me[i]->stuff_me( size );
    }
    printf( "done.\n" );
}
Mark
  • 77
  • 6
  • I haven't read your code in detail, but keep in mind that pointers take up space too. – slartibartfast Jun 11 '11 at 02:48
  • You also need to take into account data structure alignment and padding. – feathj Jun 11 '11 at 02:52
  • @myrkos is correct - four bytes, if I remember correctly, on common hardware. Your class definitions take space as well. Try doing a sizeof() of a size_check object that you make, and multiply that by the number of instances you create, and see if that doesn't make up the difference. –  Jun 11 '11 at 02:55
  • ~60MB worth? I would expect that to maybe double the amount of memory used. Do you know what the padding would be? Is it allocationg a minimum of 64 bytes for each new? – Mark Jun 11 '11 at 02:55
  • The size of size_check is 24 bytes and there are 20 of them so that should be 480 bytes, negligable. The dynamic array versions are 8 bytes. – Mark Jun 11 '11 at 03:03

4 Answers4

3

In my test using VS2010, a debug build had a working set size of 52,500KB. But a release build had a working set size of 20,944KB.

Debug builds will usually use more memory than optimized builds due to the debug heap manager doing things like creating memory fences.

In release builds, I suspect that the heap manager reserves more memory than you are actually using as a performance optimization.

Community
  • 1
  • 1
sean e
  • 11,792
  • 3
  • 44
  • 56
  • 1
    @Mark: When you ask a question and someone answers, don't edit the answer to add your own comments; that's very confusing. Use the Add Comment link below the answer you want to comment on. – Ned Deily Jun 12 '11 at 05:53
1

Memory Leak

package = new size_type[howMany]; // instantiate 50,000 size_type's
for( unsigned int i = 0; i < howMany; ++i )
{
    size_type *me = new size_type; // Leak: results in an extra 50k size_type's being instantiated
    *me = 33;
    package[i] = *me;  // Set a non-pointer to what is at the address of pointer "me"
    // Would package[i] = 33; not suffice?
}

Furthermore, make sure you've compiled in release mode

Brian Webster
  • 30,033
  • 48
  • 152
  • 225
1

There might be a couple reasons why you're seeing such a large memory footprint from your test program. Inside your

void size_check::stuff_me( unsigned int howMany )
{

This method is always getting called with howMany = 50000.

package = new size_type[howMany];

Assuming this is on a 32-bit setup the above statement will allocate 50,000 * 4 bytes.

for( unsigned int i = 0; i < howMany; ++i )
{
    size_type *me = new size_type;

The above will allocate new storage on each iteration of the loop. Since this loops 50,000 and the allocation never gets deleted that effectively takes up another 50,000 * 4 bytes upon loop completion.

        *me = 33;
        package[i] = *me;
    }
}

Lastly, since stuff_me() gets called 20 times from main() your program would have allocated at least ~8Mbytes upon completion. If this is on a 64-bit system than the footprint will likely double since sizeof(long) == 8bytes.

The increase in memory consumption could have something to do with the way VS implements dynamic allocation. For performance reasons, it's possible that due to the multiple calls to new your program is reserving extra memory so as to avoid hitting up the OS everytime it needs more.

FYI, when I ran your test program on mingw-gcc 4.5.2, the memory consumption was ~20Mbytes -- much lower than what you were seeing but still a substantial amount. If I changed the stuff_me method to this:

void size_check::stuff_me( unsigned int howMany )
{
    package = new size_type[howMany];
    size_type *me = new size_type;
    for( unsigned int i = 0; i < howMany; ++i )
    {
        *me = 33;
        package[i] = *me;
    }
    delete me;
}

memory consumption goes down quite a bit down to ~4-5mbytes.

greatwolf
  • 20,287
  • 13
  • 71
  • 105
1

I think I found the answer by delving into the new statement. In debug builds there are two items that are created when you do a new. One is _CrtMemBlockHeader which is 32 bytes in length. The other is noMansLand (a memory fence) with a size of 4 bytes which gives us an overhead of 36 bytes for each new. In my case each individual new for a char was costing me 37 bytes. In release builds the memory usage is reduced to about 1/2 but I can't tell exactly how much is allocated for each new as I can't get to the new/malloc routine.

So my work around is to allocate a large block of memory to hold the file in memory. Then parse the memory image filling in a vector of pointers to the beginning of each of the records. Then on demand, I build a record from the memory image using the pointer to the beginning of the selected record. Doing this reduced the memory footprint to <25MB.

Thanks for all your help and suggestions.

Mark
  • 77
  • 6