10

I have learned that

char szA[] = "abc";

uses "abc" to initialize array szA, so it is stored in the stack memory and will be destroyed when function ends.

On the other hand, consider:

char * szB = "abc";

In here "abc" stored in the data memory section like static variables, and szB is just an address of it.

In this point, I was wonder:

If I try

int i = 0;
while(i++ < 1000000)
    char * szC = "hello"

will this make 1000000 of "hello" in data section?

To figure this out, I have written test code:

#include <iostream>

using namespace std;

char* testA(char* arr)
{
    return arr;
}

char* testB(char* arr)
{
    return arr;
}

void main()
{
    cout << "testA---------------\n";
    cout << int(testA("abc")) << endl;
    cout << int(testA("cba")) << endl;

    cout << "testB---------------\n";
    cout << int(testB("abc")) << endl;
    cout << int(testB("cba")) << endl;

    cout << "local---------------\n";
    char* pChA = "abc";
    cout << int(pChA) << endl;
    char* pChB = "cba";
    cout << int(pChB) << endl;
}

And the result is:

testA---------------
9542604
9542608
testB---------------
9542604
9542608
local---------------
9542604
9542608

So, apparently there is only one space for each string literal in data memory.

But how does the compiler know that the literal string(const char*) already exists in data memory?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
KID
  • 119
  • 6

3 Answers3

9

The compiler scans the source code file, looks for, and stores all occurrences of string literals. It can use a mechanism such as a lookup table to do this. It then runs through the list and assigns the same address to all identical string literals.

What is more interesting is how this happens across different compilation units in the same project, since the compiler only processes one file at a time , and knows nothing about the other files.

In this case, the linker needs to step in and help, by actually storing some information on the string literals inside a section of the .obj generated for the file, as mentioned in these answers:

How Do C++ Compilers Merge Identical String Literals

String Literal address across translation units

The last one also raises the important point that identical string literals having the same address is not guaranteed by the C++ spec, but is rather an implementation detail.

Gonen I
  • 5,576
  • 1
  • 29
  • 60
5

In some situations, string literals need to be translated to static arrays of characters. This happens at compile time. Your loop cannot allocate the static memory a million times; it's just not possible. A static variable can only be allocated once.

The compiler can allocate static memory for each string literal that it sees in the source code. The compiler may use the same static memory for identical string literals, so after char* p = "Hello"; char* q = "Hello"; p and q may be equal or not equal. The compiler may use the same static memory for the same sequence of bytes, so after char* p = "Hello"; char* q = "ello"; &p[1] and &q[0] may be equal or not equal.

How well the compiler does reusing the same static memory depends on the quality of the compiler. It can just keep track of all string literals, delaying code generation until it knows all string literals in a compilation unit, then combine equal strings to the same address, combine suffixes like "Hello" and "ello" and generate only the string literals that are needed.

Also, for something like sizeof ("Hello") or "Hello" [2] no static memory needs to be created at all. For pointer comparison, like p == "Hello" or "Hello" == "Hello", the compiler can just say the result is false without allocating memory.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
gnasher729
  • 51,477
  • 5
  • 75
  • 98
3

A compiler typically uses hash tables for various strings like identifiers and string literals. It's easy to know whether an identifier or a string literal has appeared at least once. So whenever the compiler sees a string literal, it checks if the same string has appeared before, and if yes it will use the same string for the next pointer it sees.

iBug
  • 35,554
  • 7
  • 89
  • 134