20

Clarification: Given that a string literal can be rewritten as a const char[] (see below), imposing a lower max length on literals than on char[]s is just a syntactic inconvenience. Why does the C standard encourage this?


The C89 standard has a translation limit for string literals:

509 characters in a character string literal or wide string literal (after concatenation)

There isn't a limit for a char arrays; perhaps

32767 bytes in an object (in a hosted environment only)

applies (I'm not sure what object or hosted environment means), but at any rate it's a much higher limit.

My understanding is that a string literal is equivalent to char array containing characters, ie: it's always possible to rewrite something like this:

const char* str = "foo";

into this

static const char __THE_LITERAL[] = { 'f', 'o', 'o', '\0' };
const char* str = __THE_LITERAL;

So why such a hard limit on literals?

npostavs
  • 4,877
  • 1
  • 24
  • 43

3 Answers3

23

The limit on string literals is a compile-time requirement; there's a similar limit on the length of a logical source line. A compiler might use a fixed-size data structure to hold source lines and string literals.

(C99 increases these particular limits from 509 to 4095 characters.)

On the other hand, an object (such as an array of char) can be built at run time. The limits are likely imposed by the target machine architecture, not by the design of the compiler.

Note that these are not upper bounds imposed on programs. A compiler is not required to impose any finite limits at all. If a compiler does impose a limit on line length, it must be at least 509 or 4095 characters. (Most actual compilers, I think, don't impose fixed limits; rather they allocate memory dynamically.)

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • Though of course there is a practical limit—if the compiler is a 32-bit executable, it certainly wouldn't be able to handle a string literal over 4G (object file format limitations notwithstanding). The actual limit of course would be much lower. – Adam Rosenfield Jul 15 '12 at 01:58
  • Isn't an initialized `const char[]` built at compile time too? – npostavs Jul 15 '12 at 14:13
  • @npostavs: It can be, but the 32767-byte limit (increased to 65536 in C99) applies to run-time objects, regardless of how they're built. – Keith Thompson Jul 15 '12 at 17:22
6

It's not that 509 characters is the limit for a string, it's the minimum required for ANSI compatibility, as explained here.

I think that the makers of the standard pulled the number 509 out of their ass, but unless we get some official documentation from this, there is no way for us to know.

As far as how many characters can actually be in a string literal, that is compiler-dependent.

Here are some examples:

  • MSVC: 2048
  • GCC: No Limit (up to 100,000 characters), but gives warning after 510 characters:

    String literal of length 100000 exceeds maximum length 509 that C90 compilers are required to support

Richard J. Ross III
  • 55,009
  • 24
  • 135
  • 201
  • Interesting information, but it doesn't actually answer the question. – Keith Thompson Jul 15 '12 at 01:27
  • @KeithThompson I disagree. It answers the question in that it explains that it isn't a 'limit', but a 'minimum', so on most compilers, there will be no difference. – Richard J. Ross III Jul 15 '12 at 01:29
  • 2
    I think the key point is that the 509 referenced in the standard is a minimum, not a maximum. – Michael Mior Jul 15 '12 at 01:29
  • The question is why there are different limits on string literals vs. run-time objects. Your answer doesn't mention the latter. – Keith Thompson Jul 15 '12 at 01:32
  • I can see by this answer and comments that I worded my question poorly; the question was supposed to be about why there are different limits on string literals vs. an initialized `const char[]`. Because a string literal **is** just an initialized `const char[]`. – npostavs Jul 15 '12 at 15:12
  • 1
    @npostavs I think that is where you are incorrect. A `const char *` is much different than an initialized `const char []`, specifically where it resides in memory. – Richard J. Ross III Jul 15 '12 at 15:14
  • @RichardJ.RossIII I'm not talking about a `const char*`, that just points to the literal. – npostavs Jul 15 '12 at 19:14
  • 4
    @RichardJ.RossIII, regarding your comment about where 509 came from: I don't think it was from someone's ass. That's 509 characters, leaving room for a one byte string-ending character (`\0`) and a two-byte pointer. At least, that's my guess. – Richard Oct 15 '12 at 09:41
2

Sorry about the late answer, but I'd like to illustrate the difference between the two cases (Richard J. Ross already pointed out that they're not equivalent.)

Suppose you try this:

const char __THE_LITERAL[] = { 'f', 'o', 'o', '\0' };
const char* str = __THE_LITERAL;
char *str_writable = (char *) str;  // Not so const anymore
str_writable[0] = 'g';

Now str contains "goo".

But if you do this:

const char* str = "foo";
char *str_writable = (char *) str;
str_writable[0] = 'g';

Result: segfault! (on my platform, at least.)

Here is the fundamental difference: In the first case you have an array which is initialized to "foo", but in the second case you have an actual string literal.

On a side note,

const char __THE_LITERAL[] = { 'f', 'o', 'o', '\0' };

is exactly equivalent to

const char __THE_LITERAL[] = "foo";

Here the = acts as an array initializer rather than as assignment. This is very different from

const char *str = "foo";

where the address of the string literal is assigned to str.

Yakov Shklarov
  • 267
  • 2
  • 7
  • Oh, I meant for __THE_LITERAL to be a static variable, in which case segfault results in both cases. – npostavs Jan 18 '13 at 15:12
  • @npostavs: Hmm, you're right. Interesting. Actually I was mistaken about the `= "foo"` being treated exactly as `= {'f','o','o','\0'}`. If I do the same thing using more than 509 characters, gcc gives a warning in the first case but not the second. I guess it's because of what Keith Thompson said above -- compilers might use fixed-size data structures for processing literals. – Yakov Shklarov Jan 19 '13 at 00:36