4

There are a lot of good answers on this site regarding the static lifetime of string literals. As I understand it, they are allocated in read-only memory (others have referred specifically to a "string-literal pool"). As such, returning a pointer to a string literal would not result in a dangling pointer. They are also lvalues.

Why does the same not hold for an integer constant literal? Does it not also have static storage duration? If I attempt to assign to an integer constant, gcc complains that it is not an lvalue.

Why are string literals given this special treatment? (And compound literals in C99). I've heard the following definition for lvalues: "An lvalue (locator value) represents an object that occupies some identifiable location in memory (i.e. has an address)." If a string literal (a character array type) must be addressable in memory, why does the same not apply to an integer? Even an integer still has to be allocated in the program's address space. Is there a performance reason?

  • 2
    String literals is not something native to a processor. They *must* be stored in memory (except the cases they are very short), while the "literals" such as integer constants can be just naturally passed as operands to the corresponding assembly instructions and/or stored in registers. And of course a bit of historical reasons is there... – Eugene Sh. May 23 '18 at 14:07
  • Probably has to do with the fact that a string literal is not a singular value. An int is just a value that the compiler can just be passed directly in to an instruction. A string, on the other hand is an array that is operated on by its address, which means it needs to exist somewhere in memory. The pointer to the string literal would probably be the string's equivalent of what you get with integral constants. – Christian Gibbons May 23 '18 at 14:08
  • 1
    `char *p = "Hello";` is useful to allow, `int *p = &3;` is not so useful – M.M May 23 '18 at 14:18
  • @unwind http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1p1 Looks like the OP is confusing the string literal with a pointer to it. – Eugene Sh. May 23 '18 at 14:18
  • 1
    Note that you can make lvalue literals, e.g. `int *p = &(int){3};` – M.M May 23 '18 at 14:19
  • 1
    Related but not a duplicate: [String literals: Where do they go?](https://stackoverflow.com/questions/2589949/string-literals-where-do-they-go) – Lundin May 23 '18 at 14:19

3 Answers3

3

An integer constant literal is not an lvalue (see the C standard n1570 §6.3.2.1), and (or perhaps because) usually is not even in addressable memory. However, a string literal is an lvalue (§6.4.5) that should not be modified (otherwise, undefined behavior).

In many cases, small enough integer literals are translated into a subpart of a single machine code instruction and don't have any location.

For some silly example, int zero(void) { return 0; } is compiled (using GCC 8.1 on Linux/x86-64/Debian) with gcc -O2 -S -fverbose-asm into

    .globl  zero
    .type   zero, @function
zero:
.LFB0:
    .cfi_startproc
# zero.c:1: int zero(void) {return 0;}
    xorl    %eax, %eax  #
    ret 
    .cfi_endproc

You don't see any mention of 0 in the generated machine code!

Of course details about implementation of integer literals are obviously implementation specific. They depend upon the instruction set architecture, the optimization levels, the compiler version, the phase of the moon, the mood of my cat (I'm not sure about the two last items, I leave you to check).

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 1
    True, but the fact that the compiler makes a specific optimisation were it can do so without changing the operation of the program does not imply anything about the semantics of language. IIRC, early Fortran was pass by references so implemented constants in memory and you could assign a new value to say, 2 inside a procedure, which was not very useful. – Pete Kirkham May 23 '18 at 14:42
  • Just out of curiosity, are array initializer expressions (e.g {1, 2, 3}) given similar treatment to string literals? If the variable it's initializing is automatic, will the object {1, 2, 3} perhaps be addressable with static duration in read only memory? If not, what is the compiler's rationale for differentiating between "123" and {1,2,3} in terms of memory allocation? –  May 23 '18 at 14:45
2

Why does the same not hold for an integer constant literal? Does it not also have static storage duration?

Integer constants are not "literals" in the standard's terminology. "Constants" are a separate thing. And no, constants do not have static storage duration because they are not objects at all. They are constants. C does not define them to have storage of any kind. In practice, they usually are directly represented in the generated executable code, not among the program's data.

If I attempt to assign to an integer constant, gcc complains that it is not an lvalue.

It isn't. In other words, a constant does not correspond to a memory location, it's just a constant.

Why are string literals given this special treatment? (And compound literals in C99).

The treatment of string and compound literals is not special, it's just what it is. You seem to be getting confused by incorrectly considering these kinds of object literals to be a similar kind of thing to constants, but they are not. String and compound literals represent objects that do have representations as data in memory.

I've heard the following definition for lvalues: "An lvalue (locator value) represents an object that occupies some identifiable location in memory (i.e. has an address)."

Yes.

If a string literal (a character array type) must be addressable in memory, why does the same not apply to an integer?

You are again glancing off the point. Integers can and often do have addressable locations in memory. But integer constants do not (or at least, C does not require them to have). Their representations in a program are typically in executable code.

Part of the significance of the term "literal" is that the entity so described represents an actual object of the specified type.

A "constant", on the other hand, represents a value of the specified type.

Even an integer still has to be allocated in the program's address space.

There is a somewhat subtle distinction here, but an important one, between objects and values. Objects are identified with storage locations; they have and / or contain values. Thus, an object of integer type indeed does have storage, and an expression that designates that storage is an lvalue. On the other hand, although a particular integer value may in fact be the value of any number of objects, but the value itself is not inherently associated with a specific storage location.

Is there a performance reason?

Fundamentally, there are language design reasons. Some of these do promote better performance -- i.e. values can be encoded directly into executable machine code, rather than the machine code having to load them from (other) memory -- but I don't take those to be the primary factor.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
1

The reason C string literals are lvalues is because they're technically arrays of type char[], and arrays always must be addressable in memory.

If you're asking why, my best guess would be because it makes no sense for literals to be lvalues, but since arrays can't be assigned to anyway there's no point in bothering to explicitly forbid it as a lvalue in this one case (of string literals). There's probably a somewhat more complete explanation somebody more knowledgeable about construction of compilers will be able to provide.