7

I'd like to ask if is it portable to rely on string literal address across translation units? I.e:

A given file foo.c has a reference to a string literal "I'm a literal!", is it correct and portable to rely that in other given file, bar.c in instance, that the same string literal "I'm a literal!" will have the same memory address? Considering that each file will be translated to a individual .o file.

For better illustration, follows an example code:

# File foo.c
/* ... */
const char * x = "I'm a literal!"

# File bar.c
/* ... */
const char * y = "I'm a literal!"

# File test.c
/* ... */
extern const char * x;
extern const char * y;
assert (x == y); //Is this assertion going to fail?

And a gcc example command lines:

gcc -c -o foo.o -Wall foo.c
gcc -c -o bar.o -Wall bar.c
gcc -c -o test.o -Wall test.c
gcc -o test foo.o bar.o test.o

What about in the same translation unit? Would this be reliable if the strings literals are in the same translation unit?

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Felipe Lavratti
  • 2,887
  • 16
  • 34
  • 8
    No you cannot rely on this. – Neil Kirk Oct 09 '14 at 13:27
  • @NeilKirk Thanks Neil, I've edited and added a secondary question. Any rationale behind it would be of great illumination. – Felipe Lavratti Oct 09 '14 at 13:30
  • 2
    @fanl - The Visual C++ suite of compilers has an option of "string pooling" where the same literal will have the same pointer value. The key word in that last sentence is *option*. This implies that you cannot rely that the pointer values are equal. – PaulMcKenzie Oct 09 '14 at 13:37
  • I imagine the rational is to allow faster compiling. String pooling has a cost. – Neil Kirk Oct 09 '14 at 13:39
  • @PaulMcKenzie I would expect that to only work on `const` strings. – Alnitak Oct 09 '14 at 13:44
  • 1
    @Alnitak A string literal isn't a const string? – Neil Kirk Oct 09 '14 at 13:51
  • @NeilKirk It depends on the language. In C++, a string literal has type `char const[]`, and is a const object. In C, a string literal has type `char[]`. But there is a special rules which says you're not allowed to modify it, or undefined behavior ensues. – James Kanze Oct 09 '14 at 13:58
  • I forget whether it's permitted or not, but with a non-`const` string I believe it's possible (but may lead to undefined behaviour) for the programmer to overwrite the string's contents, which if it's pooled wouldn't work. – Alnitak Oct 09 '14 at 13:58
  • As to why pooling isn't required: early C compilers didn't pool, and _did_ allow modifying a string literal, and the C standard didn't want to force the implementers of these compilers to break existing code. – James Kanze Oct 09 '14 at 13:59
  • @Alnitak Aha my comments only apply to C++ and not C. It's totally not permitted to overwrite a string literal in C++. – Neil Kirk Oct 09 '14 at 13:59
  • Note that you may have `static char buf[]` to force to have the same address. So with C++, following may help [Aligning static string literals](http://stackoverflow.com/a/22067775/2684539) (and you may forget the alignment stuff). – Jarod42 Oct 09 '14 at 14:06

2 Answers2

13

You can not rely on identical string literals having the same memory location, it is an implementation decision. The C99 draft standard tells us that it is unspecified whether the same string literal are distinct, from section 6.4.5 String literals:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

For C++ this covered in the draft standard section 2.14.5 String literals which says:

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.

The compiler is allowed to pool string literals but you would have to understand how it works from compiler to compiler and so this would not be portable and could potentially change. Visual Studio includes an option for string literal pooling

In some cases, identical string literals may be pooled to save space in the executable file. In string-literal pooling, the compiler causes all references to a particular string literal to point to the same location in memory, instead of having each reference point to a separate instance of the string literal. To enable string pooling, use the /GF compiler option.

Note that it does qualify with In some cases.

gcc does support pooling and across compilation units and you can turn it on via -fmerge-constants:

Attempt to merge identical constants (string constants and floating-point constants) across compilation units.

This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.

note, the use of attempt and if ... support it.

As for a rationale at least for C for not requiring string literals to be pooled we can see from this archived comp.std.c discussion on string literals that the rationale was due to the wide variety of implementation at the time:

GCC might have served as an example but not as motivation. Partly the desire to have string literals in ROMmable data was to support, er, ROMming. I vaguely recall having used a couple of C implementations (before the X3J11 decision was made) where string literals were either automatically pooled or stored in a constant data program section. Given the existing variety of practice and the availability of an easy work-around when the original UNIX properties were wanted, it seemed best to not try to guarantee uniqueness and writability of string literals.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • 1
    You mean that both, expecting the address is the same across and within translation units, are unreliable? – Felipe Lavratti Oct 09 '14 at 13:31
  • 2
    @fanl That's what he said. In the original K&R C, each string literal was guaranteed to be unique, with a different address (and you were allowed to modify a string literal); the original C standard changed this, and _permitted_ identical string literals to have the same address. – James Kanze Oct 09 '14 at 13:56
  • @fanl it is unreliable in all cases even though compilers support pooling, both `gcc` and `Visual Studio` indicate in their documents that it only applies in *some cases*. – Shafik Yaghmour Oct 15 '14 at 15:12
4

No, you can't expect the same address. If it happens, happens. But there's nothing enforcing it.

§ 2.14.5/p12

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation defined. The effect of attempting to modify a string literal is undefined.

The compiler can do as it pleases. They can be stored in different addresses if they are in different translation units or even if they are in the same translation unit, regardless of the fact that they're read-only memory.

On MSVC, for instance, the addresses are totally different in both cases, but again: nothing prevents the compiler from merging the pointers' values (not even where, as far as the read-only section constraint is obliged).

Marco A.
  • 43,032
  • 26
  • 132
  • 246