24

I am trying to include huge string in my c++ programs, Its size is 20598617 characters , I am using #define to achieve it. I have a header file which contains this statement

#define "<huge string containing 20598617 characterd>"

When I try to compile the program I get error as fatal error C1060: compiler is out of heap space

I tried following command line options with no success

/Zm200
/Zm1000
/Zm2000

How can I make successful compilation of this program?

Platform: Windows 7

Xinus
  • 29,617
  • 32
  • 119
  • 165
  • 15
    It's really interesting that this is getting upvoted: other people must want to do this as well, and I have no idea why... – egrunin May 23 '10 at 05:16
  • 4
    @egrunin: Up votes just mean "this is a good question" not "this is affecting me as well". – Dean Harding May 23 '10 at 05:19
  • 7
    This is not a good question. – Snake Plissken May 23 '10 at 05:22
  • 1
    Sure it is. I've seen this sort of question many times from beginners in other places. It's a good resource as to why you don't use internal strings too much. Heck, we've had compilers fail if we have too many data tables in codes - especially if those tables accidently contain non-POD data. – Michael Dorgan May 23 '10 at 05:29
  • 10
    @Snake: just because the answer is "you shouldn't do that" doesn't mean the *question* is bad. Sometimes the best way to learn what not to do is to try it first. – Dean Harding May 23 '10 at 05:33
  • 2
    Yes, it does. A 20 megabyte long C character string doesn't make sense. I'm sure the questioner is using a very long string to store some other kind of data. – Snake Plissken May 23 '10 at 05:42
  • 4
    `#define "foo"` does not make sense. `#define FOO "foo"` might. – ndim May 23 '10 at 08:16
  • Have you tried breaking the string into pieces and simply #define several pieces? What alternative solutions have you attempted? – Carlos May 23 '10 at 09:36
  • @Snake: According to your logic, the only questions that are "good" ones, and therefore the only questions that should be allowed on SO are where the questioner asks, "Should I do XYZ?" and the answer is "Yes." – John Dibling May 24 '10 at 15:33
  • 1
    @John Dibling: I recommend the following fine article from one of our much-loved "founding fathers", Joel Spolsky, about C strings: http://www.joelonsoftware.com/articles/fog0000000319.html. I am sorry but storing data in 20 megabyte long C strings in header files is just **not a good idea**. – Snake Plissken May 24 '10 at 16:01
  • @Snake: Whether or not storing giant strings in a header is a good or bad idea is irrelevant. – John Dibling May 24 '10 at 16:15
  • 1
    @Snake: FWIW, I agree with you. Storing giant strings in a header probably is a bad idea. *That's not the point!* The point is just because it is a bad idea doesn't mean the question should be downvoted. In fact if it is a bad idea, that might be an argument to not downvote the question, in order to give people the opportunity to tell the OP *why* what they are doing is bad. – John Dibling May 24 '10 at 16:18

7 Answers7

19

You can't, not reliably. Even if it will compile, it's liable to break the runtime library, or the OS assumptions, and so forth.

If you tell us why you're trying to do it, we can offer lots of alternatives. Deciding how to handle arbitrarily large data is a major part of programming.

Edited to add:

Rather than guess, I looked into MSDN:

Prior to adjacent strings being concatenated, a string cannot be longer than 16380 single-byte characters.

A Unicode string of about one half this length would also generate this error.

The page concludes:

You may want to store exceptionally large string literals (32K or more) in a custom resource or an external file.

What do other compilers say?

Further edited to add:

I created a file like this:

char s[] = {'x','x','x','x'};

I kept doubling the occurrences of 'x', testing each one as an #include file.

An 8388608 byte string succeeded; 16777216 bytes failed, with the "out of heap space" error.

egrunin
  • 24,650
  • 8
  • 50
  • 93
  • 1
    -1. The question has merit in and of itself. There may be viable alternatives, but it's also interesting to know the best way to directly incorporate large data into a program. – Eric May 23 '10 at 06:36
  • 2
    IMO "Why" is not important. I believe that when someone asks a precise question (like this one), it is better to give precise answer. When people want to do something weird, they (sometimes) have their reasons for that (if they don't, they'll learn something useful later). – SigTerm May 23 '10 at 06:47
  • 3
    +1. I think "incorporating large data into a program." is generally preferred "not done like this." The question has merit, but only as far as "What's the best workaround." – Daniel Harms May 23 '10 at 07:04
  • 2
    @Eric: My answer, though less useful than @Ira Baxter, was still correct. Downvote if the answer was factually incorrect, but I don't think I was. – egrunin May 23 '10 at 13:32
  • +1. For looking it up on MSDN and for putting it in a resource. – Gregor Brandt May 23 '10 at 20:09
  • @SigTerm: re: "if they don't, they'll learn something useful later" If the OP posts some crazy thing they are trying to do, and it's painfully obvious to everyone that they are *probably* trying to do the wrong thing, then how are they going to learn something useful later if nobody questions their motivation, aside from using a resource other than SO? – John Dibling May 24 '10 at 15:36
  • @John Dibling: Dude, when you find correct solution yourself, you normally learn a lot of additional stuff (which you'll never find if you simply ask someone for help). Also... 1) this question doesn't qualify as "crazy". 2) "Probably" isn't enough. Unless you are 100% SURE that they ARE (notice: not "probably") doing wrong thing, you'll be wasting time telling OP things he/she didn't ask for (which is IMO quite rude). 3) Programmer should be using his/her own brain first. Asking other people is last resort. – SigTerm May 24 '10 at 18:27
  • @SigTerm: So, to paraphrase your stand: 1) When someone asks a technical question, it is rude to offer another approach to solve a problem, and 2) Programmers should strive to be islands, seeking no outside input from other programmers except in cases of last resort. Did I get that right? Dude? – John Dibling May 24 '10 at 18:39
  • @John Dibling: I'm not in the mood for strawman arguments. IMO: 1) When someone asks precise question, you should give precise answer to that question. You should not offer alternative, unless person asks for one. Answering question that wasn't asked is impolite and wastes time. Your time, and OPs time. 2) Programmer should know how to find answers himself, there for asking other people normally isn't necessary, and should be used when information cannot be obtained using other means, because it is slow. Also, asking for help too frequently may be a sign of laziness. – SigTerm May 24 '10 at 18:54
  • @John Dibling, @SigTerm: you're both right & both wrong. If you **only** say "don't do it", or downvote, that's wrong, but if you **only** give the technical answer (when there's reasonable doubt about their level of experience) without suggesting they rethink the problem, that's also wrong. It's **not** rude to offer additional useless advice, if you start by answering the question. – egrunin May 24 '10 at 18:56
  • @Ira: compiler didn't like that, truncated each to 'x'. – egrunin May 25 '10 at 02:50
  • @Ira: `char s[] = {"xx","xx"};` gives `error C2078: too many initializers` – egrunin May 25 '10 at 05:00
  • What do other compilers say? GCC's `-Woverlength-strings` compiler warning gives some interesting figures: Warn about string constants which are longer than the "minimum maximum" length specified in the C standard. … The limit applies _after_ string constant concatenation, and does not count the trailing NUL. In C90, the limit was 509 characters; in C99, it was raised to 4095. C++98 does not specify a normative minimum maximum, so we do not diagnose overlength strings in C++. – BRPocock Dec 12 '11 at 21:51
14

I suspect you are running into a design limit on the size of a character string. Most people really think that a million characters is long enough :-}

To avoid such design limits, I'd try not to put the whole thing into a single literal string. On the suspicion that #define macro bodies likewise have similar limits, I't try not to put the entire thing in a single #define, either.

Most C compilers will accept pretty big lists of individual characters as initializers. If you write

char c[]={ c1, c2, ...  c20598617 };

with the c_i being your individual characters, you may succeed. I've seen GCC2 applications where there were 2 million elements like this (apparantly they were loading some type of ROM image). You might even be able to group the c_i into blocks of K characters for K=100, 1000, 10000 as suits your tastes, and that might actually help the compiler.

You might also consider running your string through a compression algorithm, putting the compressed result into your C++ file by any of the above methods, and decompressing after the program was loaded. I suspect you can get a decompression algorithm into a few thousand bytes.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Qts resource system also used character arrays to store binary data last time i checked. – Georg Fritzsche May 23 '10 at 05:34
  • 1
    +1. If the string is reasonably compressible, it's probably faster to read the compressed form in and decompress it than to just read it in, considering the speed of the processor vs. the speed of the HDD. – Charles May 23 '10 at 05:54
  • @Charles: My last variant answer assumes he is building the compressed string *into* his load image. Why have a separate file to read, when the loader will do it for you, which is what I suspect the OP wanted? – Ira Baxter May 23 '10 at 17:19
  • @Ira Baxter, the data needs to be loaded whether it's done by the OS loader or the program itself. If compression means the program starts up a little bit faster, that would be a benefit. – Mark Ransom May 23 '10 at 20:28
  • Ah, yes, I was assuming that Charles was pushing reading a seperate file. Now I see we're all on the same page. I run Windows with a compressed file system for precisely this reason: CPUs are really fast, and disks are not. – Ira Baxter May 23 '10 at 21:38
  • It's more of a practical and logistical limit than a design limit. – Lightness Races in Orbit May 21 '11 at 15:05
9

Um, store the string in a separate resource of some sort and load it in? Seriously, in embedded land, you would have this as a separate resource and not hold it in RAM. On windows, I believe you can use .dlls or other external resources to handle this for you. Compilers aren't designed to hold this size of resources for you and they will fail.

sbi
  • 219,715
  • 46
  • 258
  • 445
Michael Dorgan
  • 12,453
  • 3
  • 31
  • 61
9

Store the string to a file and just open and read it...

Its much cleaner/organized that way [i'm assuming that right now you have a file named blargh.h which contains that one #Define...]

Warty
  • 7,237
  • 1
  • 31
  • 49
  • Downvote because you're not answering the question, which is "how to store lots of data in a source file". The OP does not want to know if there are alternatives. Just for the record, one reason I can think of to do this legitimately: there is a programming contest where the input has to be a single source file. – Marc Dec 21 '17 at 08:17
  • Often Q's about complex solutions are unaware of simpler alternatives. E.g. "I want to draw text on Windows, how do I load a font in bitmap form so I can blit it to the screen?" really wants "no, don't do that; use these APIs instead". On SO we should be suggesting the simpler alternative as long as it's relevant. If the more complex approach is necessary, the question should clarify why as to not mislead future people seeking help. For 99% of users, patching build artifacts or increasing compiler heap space to support a 20MB payload is a huge janky no-no. – Warty Dec 25 '17 at 10:56
  • In that case, you should still just answer the question and maybe precede it with "normally you don't want to do this because of so-and-so, but if you really need to, this is what you do", and add alternatives after that. Even if the OP should find another solution, you're also writing for all the other people looking to answer this problem. I found this page because of the aforementioned reason, for example. – Marc Dec 26 '17 at 19:07
  • If our time were infinite, every response should be pages long to intricately cover these bases. Our time is not infinite, so the succinct responses in this thread all seem appropriate. Our target audience is "developers who want to include larger data in their projects but don't know how" (wide audience), not "the percent of a percent of a percent of a percent of developers who are doing a programming contest involving single source files" (extremely narrow audience). One is more valuable to respond to, because it is more relevant for the majority of people reaching this thread. – Warty Jan 13 '18 at 13:04
8

Increase the compiler heap space.

amphetamachine
  • 27,620
  • 12
  • 60
  • 72
  • 3
    Hmm, this is actually a sensible answer (to a crazy question) and yet it's been downvoted. The last time I had a C program with a data segment of twelve megabytes, increasing the memory limits with `ulimit` worked. – Snake Plissken May 23 '10 at 06:18
  • @Snake - Thank you. I'm glad someone out there still expects something more from their OS and compiler. – amphetamachine May 23 '10 at 06:41
6

If your string comes from a large text or binary file, you may have luck with either the xxd -i command (to get everything in an array, per Ira Baxter's answer) or a variant of the bin2obj command (to get everything into a .o file you can link into the program).

Note that the string may not be null terminated in this case.

See answers to the earlier question, "How can I get the contents of a file at build time into my C++ string?"

(Also, as an aside: note the existence of the .xbm format.)

Community
  • 1
  • 1
leander
  • 8,527
  • 1
  • 30
  • 43
1

This is a very old question, but since there's no definitive answer yet: C++11's raw string literals seem to do the job.

This compiles nicely on GCC 4.8:

#include <string>

std::string data = R"(
    ... <1.4 MB of base85-encoded string> ...
)";

As said in other posts in this thread, this is definitely not the preferred way of handling large amounts of data.

Marc
  • 1,425
  • 2
  • 11
  • 16