4

Why do same strings in a char* array have the same address?

Is this because of compiler optimization?

Example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define ARR_SIZE 7

int main(int argc, char** argv) {
  size_t i = 0, j = 0;

  char * myArr[ARR_SIZE] = {
    "This is the first string",
    "This is the second string",
    "This is Engie",
    "This is the third string",
    "This is Engie",
    "This is the fifth string",
    "This is Engie"

  };

  for (i = 0; i < ARR_SIZE; ++i){
    for (j = i + 1; j < ARR_SIZE; ++j){
      if (memcmp((myArr + i), (myArr + j), sizeof(char*)) == 0){
      fprintf(stdout, "%p, %p\n", *(myArr + i), *(myArr + j));
      fprintf(stdout, "found it start index: %lu, search index: %lu\n", i, j);
      }
    }
  }
  return 0;
}

GDB:

(gdb) x/7w myArr
0x7fffffffdd10: U"\x4007a8"
0x7fffffffdd18: U"\x4007c1"
0x7fffffffdd20: U"\x4007db"
0x7fffffffdd28: U"\x4007e9"
0x7fffffffdd30: U"\x4007db"
0x7fffffffdd38: U"\x400802"
0x7fffffffdd40: U"\x4007db"


(gdb) x/7s *myArr
0x4007a8:   "This is the first string"
0x4007c1:   "This is the second string"
0x4007db:   "This is Engie"
0x4007e9:   "This is the third string"
0x400802:   "This is the fifth string"
0x40081b:   "%p, %p\n"
0x400823:   ""
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Quaxton Hale
  • 2,460
  • 5
  • 40
  • 71
  • 2
    my guess is it is indeed because of optimizations, this most likely doesnt violate `as-if` rule, so he optimizes useless copies out – Creris Oct 17 '14 at 21:19
  • 3
    The language standard explicitly allows this. – n. m. could be an AI Oct 17 '14 at 21:24
  • Understand that quoted strings are literals that are essentially stored adjacent to the instruction stream for the method. They take physical space in the compiled module. They're also (ostensibly) "constant" and should not be changed, even if the system doesn't physically prevent modification. So there's no point in duplicating them. – Hot Licks Oct 17 '14 at 21:26
  • Of note is that this even occurs in Java, where string literals are "interned" so that only one copy of each unique value exists in the heap. (But note that I said "literals".) – Hot Licks Oct 17 '14 at 21:27
  • Related to [String Literal address across translation units](http://stackoverflow.com/q/26279628/1708801) – Shafik Yaghmour Oct 18 '14 at 01:04
  • @HotLicks as is mentioned in the question I link to back in the day string literals used to modifiable and apparently until relatively recently gcc allowed this as an extension. – Shafik Yaghmour Oct 18 '14 at 01:09

1 Answers1

4

It is called constant merging. It is enabled at higher levels of optimization, typically. The compiler simply takes all of the unique constant values and crunches them down. Good for memory usage and cache efficiency.

gcc has -fmerge-constants or using -O and company

Other compilers may or may not do it. It is compiler specific.

Since it is about the easiest optimization operation to implement I would imagine all C++ compilers do it.

This is a perfect example of why:

  1. You can't make assumptions about where a constant value will live (undefined behavior)
  2. You shouldn't make changes to constant values (undefined behavior)

but we see many questions about people (not yourself) observing they got away with modifying a constant string after casting away const.

codenheim
  • 20,467
  • 1
  • 59
  • 80
  • I always knew something like this was going on, didn't know what it was called. Thank you. – Quaxton Hale Oct 17 '14 at 21:24
  • casting away constness from const char* is invoking UB, so they may go out with it, but when they try to run their code somewhere else, it may just as much crash their program – Creris Oct 17 '14 at 21:30
  • @Creris I wonder if you try whole program optimization does it enable it? (I was replying to your other comment, oops) – codenheim Oct 17 '14 at 21:31
  • well I actually observed the code badly. Because it seems the array will only get initialized when used, and I made breakpoint before its first use, so it showed no variables made. But when it comes to printing, when I print the addresses it indeed will not allow it to optimize it, even with /Ox – Creris Oct 17 '14 at 21:36