2

I have a auto-generated C++ source file, around 40 MB in size. It largely consists of push_back commands for some vectors and string constants that shall be pushed.

When I try to compile this file, g++ exits and says that it couldn't reserve enough virtual memory (around 3 GB). Googling this problem, I found that using the command line switches

--param ggc-min-expand=0 --param ggc-min-heapsize=4096

may solve the problem. They, however, only seem to work when optimization is turned on.

1) Is this really the solution that I am looking for?

2) Or is there a faster, better (compiling takes ages with these options acitvated) way to do this?

Best wishes,

Alexander

Update: Thanks for all the good ideas. I tried most of them. Using an array instead of several push_back() operations reduced memory usage, but as the file that I was trying to compile was so big, it still crashed, only later. In a way, this behaviour is really interesting, as there is not much to optimize in such a setting -- what does the GCC do behind the scenes that costs so much memory? (I compiled with deactivating all optimizations as well and got the same results)

The solution that I switched to now is reading in the original data from a binary object file that I created from the original file using objcopy. This is what I originally did not want to do, because creating the data structures in a higher-level language (in this case Perl) was more convenient than having to do this in C++.

However, getting this running under Win32 was more complicated than expected. objcopy seems to generate files in the ELF format, and it seems that some of the problems I had disappeared when I manually set the output format to pe-i386. The symbols in the object file are by standard named after the file name, e.g. converting the file inbuilt_training_data.bin would result in these two symbols: binary_inbuilt_training_data_bin_start and binary_inbuilt_training_data_bin_end. I found some tutorials on the web which claim that these symbols should be declared as extern char _binary_inbuilt_training_data_bin_start;, but this does not seem to be right -- only extern char binary_inbuilt_training_data_bin_start; worked for me.

Jon Seigel
  • 12,251
  • 8
  • 58
  • 92
Alexander
  • 41
  • 1
  • 6

6 Answers6

4

You may be better off using a constant data table instead. For example, instead of doing this:

void f() {
    a.push_back("one");
    a.push_back("two");
    a.push_back("three");
    // ...
}

try doing this:

const char *data[] = {
    "one",
    "two",
    "three",
    // ...
};

void f() {
    for (size_t i = 0; i < sizeof(data)/sizeof(data[0]); i++) {
        a.push_back(data[i]);
    }
}

The compiler will likely be much more efficient generating a large constant data table, rather than huge functions containing many push_back() calls.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
1

Can you do the same problem without generating 40 MB worth of C++? That's more than some operating systems I've used. A loop and some data files, perhaps?

György Andrasek
  • 8,187
  • 4
  • 48
  • 76
  • A loop and data files...could you expand on your answer? Maybe with a simple example. Its possible that he won't be able to get anywhere with a one sentence hint... – A. Levy Nov 30 '09 at 00:14
1

It sounds like your autogenerated app looks like this:

push_back(data00001);
...
push_back(data99999);

Why don't you put the data into an external file and let the program read this data in a loop?

codymanix
  • 28,510
  • 21
  • 92
  • 151
0

If you're just generating a punch of calls to push_back() in a row, you can refactor it into something like this:

// Old code:
v.push_back("foo");
v.push_back("bar");
v.push_back("baz");

// Change that to this:
{
    static const char *stuff[] = {"foo", "bar", "baz"};
    v.insert(v.end(), stuff, stuff + ARRAYCOUNT(stuff));
}

Where ARRAYCOUNT is a macro defined as follows:

#define ARRAYCOUNT(a) (sizeof(a) / sizeof(a[0]))

The extra level of braces is just to avoid name conflicts if you have many such blocks; alternatively, you can just generate a new unique name for the stuff placeholder.

If that still doesn't work, I suggest breaking your source file up into many smaller source files. That's easy if you have many separate functions; if you have one enormous function, you'll have to work a little harder, but it's still very doable.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
0

To complement some of the answers here, you may be better off generating a binary object file and linking it directly -- as opposed to compiling files consisting of const char[]'s.

I had a similar problem working with gcc lately. (Around 60 MB of PNG data split into some 100 header files.) Including them all is the worst option: The amount of memory needed seems to grow exponentially with the size of the compilation unit.

aib
  • 45,516
  • 10
  • 73
  • 79
  • 1
    You should have kept the PNG data in source files, not headers. Header files should just have `extern const char img_data[]; extern const size_t img_data_size;` and the source files should have `char img_data[] = {...}; const size_t img_data_size = sizeof(img_data);` It's much easier for the compiler to handle, and files using the image data don't need to be recompiled when the images change. – Adam Rosenfield Nov 30 '09 at 00:28
  • @Adam Rosenfeld: That would have worked, yes, but would have been a hack in that it would not have solved the actual problem, which is the binary stream going through the compiler in the first place. (Binary data -> C source -> compiler -> binary data -- doesn't really sound right, does it?) By the way, the 'linker' solution ended up looking exactly like yours: With headers just containing extern char* + extern size. – aib Nov 30 '09 at 01:51
  • ...and I think I did that when compiling on MacOS X, whose linker was different and the compiler suite had no obvious way of converting binary data into an object file. But as long as you have an object file containing the two symbols for data start + data size (or data start + data end, it might have been) it doesn't matter who created it and how, does it? – aib Nov 30 '09 at 02:02
0

if you cannot refactor your code, you could try to increment amount of swap space you have, provided your operating system supports large address space. This should work for 64-bit computers, but 3 gigabytes might be too much for 32 bit system.

Anycorn
  • 50,217
  • 42
  • 167
  • 261