Storage of literal constants in c++

Question

I would like to know where literal constants are actually stored in the memory?

example:

int i = 5;
char* data = char* &("abcdefgh");

the storage sections of i and data depends on where they are declared. But does the compiler store 5 and "abcdefgh" before actually copying it to the variables?

And here I can get the address of "abcdefgh" where it is stored, but why can't I get the address of 5?

Identical *string* literals *may* be stored in the same memory location, but it is not guaranteed. Storing other literals would make no sense from an efficiency point of view. — DeiDei, Apr 17 '17 at 15:33
5 doesn't have an address. An address is a number, and would take up the same space as a "5" value. So instead of using an address-based system, the compiler takes the value 5 and hard-codes it into the instruction itself. However if you store "5" in an int variable, then the variable itself has an address, and there will be a "5" value stored at the address. But a 5 by itself in your code lacks an "address". Asking for the "address" of 5 itself is the same as asking what the address of "zero" is. The memory has lots of zeros, there's no one address which is the address of zero. — Jason Lang, Apr 17 '17 at 16:39
Taking the address of something, which is not a variable (lvalue) is always a bit fishy. String literals can be a valid exception, but you should never try to write to their address. A compiler can store "abcd" and "bcd" only once to save the memory with the second string literal starting one byte later. It can also store the string literal in a read-only section of memory. — Sebastian, Apr 29 '22 at 18:01

score 13 · Accepted Answer · answered Apr 17 '17 at 15:38

13

Integer literals like 5 can be part of machine instructions. For example:

 LD A, 5

would load the value 5 into processor register A for some imaginary architecture, and as the 5 is actually part of the instruction, it has no address. Few (if any) architectures have the ability to create string literals inline in the machine instructions, so these have to actually be stored elsewhere in memory and accessed via pointers. Exactly where "elsewhere" is is not specified by the C++ Standard.

answered Apr 17 '17 at 15:38

1

But if I remember right, both C and C++ standards declare *writing* to string literals UB... – Aconcagua Apr 17 '17 at 15:40
1

yes, that's why assigning a string literal to a char* is not allowed. – The Techel Apr 17 '17 at 15:42
Emphasis on _can_ be part... Some architectures have more ability to put constants in-line than others. Especially true if we are talking about constants like `1234567890` (needs 31 bits) rather than constants like `5` (only needs 3 bits). – Solomon Slow Apr 17 '17 at 16:58
@james Even when the literals are immediate values following the opcode in memory, there is usually no easy way of finding their address. – Apr 17 '17 at 17:01
I'm thinking more of the case where the compiler creates an implicit initialized variable to hold a large `int` constant, and then emits code to fetch the value from that location when the constant is used. I have used compilers that did that. I am talking about _small_ microprocessors, as in 8-bit. Also, I'm pretty sure that there have been larger scale, but very RISC-y procssors that had to do the same tricks for large literal values. – Solomon Slow Apr 17 '17 at 17:15
Thank you so much @NeilButterworth for the solution. It helped me to understand better than many approaches... :-) (y) – infinite loop Apr 18 '17 at 05:12

Matteo Italia · Answer 2 · 2022-04-29T17:14:41.877

On the language level, string literals and numeric literals are different beasts.

The C and C++ standard essentially specify that string literals are treated "as if" you defined a constant array of characters with the appropriate size and content, and then you used its name in place of the literal. IOW, when you write

const char *foo = "hello";

it's as if you wrote

// in global scope
const char hello_literal[6] = {'h', 'e', 'l', 'l', 'o', '\0'};

...
const char *foo = hello_literal;

(there are some backwards-compatibility exceptions that allow you to even write char *foo = "hello";, without the const, but that's deprecated and it's undefined behavior anyway to try to write through such a pointer)

So, given this equivalence it's normal that you can have the address of the string literal. Integral literals, OTOH, are rvalues, for which the standard specifies that you cannot take any address - you can roughly think of them as values that the standard expect not to have a backing memory location in the conventional sense.

Now, this distinction actually descends from the fact that on the machine level they are usually implemented differently.

A string literal generally is stored as data somewhere in memory, typically in a read-only data section that gets mapped in memory straight from the executable. When the compiler needs its address it's easy to oblige, since it is data stuff that is already in memory, and thus it does have an address.

Instead, when you do something like

int a = 5;

the 5 does not really have a separate memory location like the "hello world" array above, but it's usually embedded into the machine code as an immediate value.

It's quite complicated to have a pointer to it, since it would be a pointer pointing halfway into an instruction, and in general pointing to data in a different format than what be expected for a regular int variable to which you can point - think x86 where for small numbers you use more compact encodings, or PowerPC/ARM and other RISC architectures where some values are built from an immediate manipulated by the implicit barrel shifter and you cannot even have immediates for some values - you have to compose them out of several instructions, or Harvard architectures where data and code live in different address spaces.

For this reason, you cannot take the address of numeric literals (as well as of numeric expressions evaluation results and much other temporary stuff); if you want to have the address of a number you have to first assign it to a variable (which can provide an in-memory storage), and then ask for its address.

score 5 · Answer 3 · answered Apr 17 '17 at 16:23

Although the C and C++ standards don't dictate where the literals are stored, common practice stores them in one of two places: in the code (see @NeilButterworth answer), or in a "constants" segment.

Common executable files have a code section and a data section. The data segment may be split up into read-only, uninitialized read/write and initialized read-write. Often, the literals are placed into the read-only section of the executable.

Some tools may also place the literals into a separate data file. This data file may be used to program the data into read-only memory devices (ROM, PROM, Flash, etc.).

In summary, the placement of literals is implementation dependent. The C and C++ standards state that writing to the location of literals is undefined behavior. Preferred practice with character literals is to declare the variable as const so compiler can generate warnings or errors when a write to a literal occurs.

Storage of literal constants in c++

3 Answers3

Linked