23

I've just been inspecting the following in gdb:

char *a[] = {"one","two","three","four"};
char *b[] = {"one","two","three","four"};
char *c[] = {"two","three","four","five"};
char *d[] = {"one","three","four","six"};

...and I get the following:

(gdb) p a
$17 = {0x80961a4 "one", 0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four"}
(gdb) p b
$18 = {0x80961a4 "one", 0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four"}
(gdb) p c
$19 = {0x80961a8 "two", 0x80961ac "three", 0x80961b2 "four", 0x80961b7 "five"}
(gdb) p d
$20 = {0x80961a4 "one", 0x80961ac "three", 0x80961b2 "four", 0x80961bc "six"}

I'm really surprised that the string pointers are the same for equivalent words. I would have thought each string would have been allocated its own memory on the stack regardless of whether it was the same as a string in another array.

Is this an example of some sort of compiler optimisation or is it standard behaviour for string declaration of this kind?

anastaciu
  • 23,467
  • 7
  • 28
  • 53
bph
  • 10,728
  • 15
  • 60
  • 135
  • 2
    Where did the "stack" even come from in this question? If you declared `a`, `b`, `c` and `d` as local variables, you have to say so in your question. – AnT stands with Russia Jul 09 '12 at 17:07
  • yes - they're local variables of auto duration declared within a function therefore on the stack – bph Jul 09 '12 at 17:10
  • 2
    Yes. It's an example of compiler optimisation. – Jack Jul 09 '12 at 17:27
  • 1
    Related: [Where are string constants stored by GCC and from where these pointers are mapped?](http://stackoverflow.com/q/12393888/183120) – legends2k Dec 03 '14 at 08:14
  • "I would have thought each string would have been allocated its own memory on the stack" - "on the stack"? With `static` storage duration? How? – The Paramagnetic Croissant Dec 03 '14 at 08:14
  • 1
    How? - thats just me misunderstanding how C stores string literals, above link from @legends2k very useful in explaining what is actually going on – bph Dec 03 '14 at 11:34

2 Answers2

28

It's called "string pooling". It's optional in Microsoft Compilers, but not in GCC. If you switch off string pooling in MSVC, then the "same" strings in the different arrays would be duplicated, and have different memory addresses, and so would take up an extra (unnecessary) 50 or so bytes of your static data.

EDIT: gcc prior to v 4.0 had an option, -fwritable-strings which disabled string pooling. The effect of this option was twofold: It allowed string literals to be overwritten, and disabled string pooling. So, in your code, setting this flag would allow the somewhat dangerous code

/* Overwrite the first string in a, so that it reads 'xne'.  Does not */ 
/* affect the instances of the string "one" in b or d */
*a[0] = 'x';
Josh Greifer
  • 3,151
  • 24
  • 25
  • 4
    In GCC (4.7 at least) a switch to disable pooling is -fno-merge-constants. – dbrank0 Jan 20 '14 at 13:25
  • 5
    @dbrank0 note that [gcc no longer supports fwritabe-srings](https://gcc.gnu.org/gcc-4.0/changes.html), it would be ideal to add both of these notes to your answer. – Shafik Yaghmour Oct 09 '14 at 14:20
8

(I assume that your a, b, c and d are declared as local variables, which is the reason for your stack-related expectations.)

String literals in C have static storage duration. They are never allocated "on the stack". They are always allocated in global/static memory and live "forever", i.e. as long as the program runs.

Your a, b, c and d arrays were allocated on the stack. The pointers stored in these arrays point to static memory. Under these circumstances, there's nothing unusual about pointers for identical words being identical.

Whether a compiler will merge identical literals into one depends on the compiler. Some compilers even have an option that controls this behavior. String literals are always read-only (which is why it is a better idea to use const char * type for your arrays), so it doesn't make much difference whether they are merged or not, until you begin to rely on actual pointer values.

P.S. Just out of curiosity: even if these string literals were allocated on the stack, why would you expect identical literals to be "instantiated" more than once?

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 2
    great stuff - thats helped my understanding a lot, hadn't fully understood the string literal stuff and its associated storage duration - i was incorrectly thinking of the strings as just being local variables (automatic) on the stack – bph Jul 09 '12 at 17:07
  • 3
    Nothing that I'm aware of says that two (or more) references to the same string literal *must* resolve to the same memory location. The compiler could (and some do) allocate storage for every string literal, even if some are "duplicates". See "string pooling" mentioned by @Josh. – Dan Moulding Jul 09 '12 at 17:12