1

I was on a forum just now and came across a basic question that let me to a peculiar result. The question had to do with using c_str() in C++ and an array of const char* to hold the references. Consider the following code:

#include <iostream>
#include <string>

struct appendable_array{
    int newest_item_index = 0; 
    const char* aarray[10]; 
};

void append_array(appendable_array& t, const std::string& s){
    std::cout << "Assigned \"" << s << "\"" << " index " << t.newest_item_index << std::endl; 
    t.aarray[t.newest_item_index++] = s.c_str(); 
}

int main(void) {
    struct appendable_array arr; 
    append_array(arr, std::string("Hello")); 
    append_array(arr, std::string("There")); 
    append_array(arr, std::string("World!")); 

    for(int i = 0; i < 3; i++)
        std::cout << arr.aarray[i] << std::endl;
    return 0;
}

Where the output results in:

  • Assigned "Hello" index 0
  • Assigned "There" index 1
  • Assigned "World!" index 2
  • World!
  • World!
  • World!

However, if we use unique string objects, shown below, then we get the following output.

int main(void) {
    struct appendable_array arr; 
    std::string str1("Hello"); 
    std::string str2("There"); 
    std::string str3("World!"); 

    append_array(arr, str1); 
    append_array(arr, str2); 
    append_array(arr, str3); 

    for(int i = 0; i < 3; i++)
        std::cout << arr.aarray[i] << std::endl;
    return 0;
}
  • Assigned "Hello" index 0
  • Assigned "There" index 1
  • Assigned "World!" index 2
  • Hello
  • there
  • World!

So, I was naturally curious to know why each of the original objects pointed to the same memory location. I came to the conclusion that it must have something to do with anonymous run-time objects being created that are shared (I imagine with global scope, but I did not test this). The logic here makes sense, since it is not necessary to create many anonymous objects that do not have explicit references to their location within the code.

tl;dr - How are anonymous objects shared at run-time, and how are the implemented? If I am completely wrong with this being some sort of shared object, how else might the obvious shared memory references be explained?

sherrellbc
  • 4,650
  • 9
  • 48
  • 77
  • If you read the question I linked, it will be obvious: UB. – Deduplicator Nov 05 '14 at 02:09
  • You are trying to find sense and logic in undefined behavior. It's pointless. – R Sahu Nov 05 '14 at 02:18
  • 1
    With the argument type set to `const string&`, it makes sense why the second method would result in predictable behavior. The values returned by `s.c_str()` in `append_array` are valid after the function returns. When you use temporary objects, the program is expected to exhibit undefined behavior. – R Sahu Nov 05 '14 at 02:31

1 Answers1

2

The life-time of the character array returned from s.c_str() is bound to the life-time of s (and certain changes to s). When s gets destroyed, the pointer obtained from s.c_str() becomes invalid and any access to it becomes undefined behavior.

The argument to append_array() is destroyed when the function exits. Since it is a value argument this is true in all cases, i.e., using separate strings doesn't change the situation that the behavior is undefined. The reason the second code seems to do what you think it should do is probably due to a CoW (copy on write) string being used.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • That is entirely the point of the question. With the first code, the string object is anonymous, yet each of the pointers in the `const char*` array of the structure point to the same memory location containing the last string. What is wrong with the second code? Each of the string objects is still valid while printing, and thus unique pointers to their associated c_str() buffers. It is the first code that I had questions with. – sherrellbc Nov 05 '14 at 02:11
  • @sherrellbc: because in your function signature of `append_array(appendable_array& t, std::string s)` the `std:;string` is accepted by value, that means a copy is made and the `.c_str()` isn't being applied to the same `string` object from the calling code. If you'd accepted the string by reference, then your logic would be correct. As a beginner, you'd be much safer using `std::string` everywhere you're storing/recording values and avoid calling `.c_str()`. (Dietmar +1) – Tony Delroy Nov 05 '14 at 02:16
  • @sherrellbc: What happens in case of undefined behavior is, well, undefined behavior: anything goes! Worst case scenario the code does what you think it should be doing while testing but fails miserably when it is critical to work (e.g., when lives depend on it or a lot of money depends on it). Since `append_array()` takes its argument by value the strings only live while `append_array()` is running. When `append_array()` exits the strings die and the pointers obtained from `s.c_str()` become stale. – Dietmar Kühl Nov 05 '14 at 02:16
  • @DietmarKühl, Yes, of course! I changed the second code to use references and it still works as one would expect. Changing the second code yields the same results. I updated the code above. – sherrellbc Nov 05 '14 at 02:21
  • @sherrellbc: When using references to strings which live while the strings obtained from `s.c_str()` are still alive, all is OK. When using temporaries, the strings referenced by `s` when `s.c_str()` die at the end of the expression where they are created, i.e., the problem of accessing invalid strings and having undefined behavior persists. – Dietmar Kühl Nov 05 '14 at 02:26
  • @DietmarKühl, in the first code listed above, yes. I suppose the fact that "World!" is the content pointed to by all three indices in the array is a product of the undefined behavior? I just found it curious that such a thing happened. – sherrellbc Nov 05 '14 at 02:28
  • 1
    @sherrellbc: yes, that's undefined behavior. Anything can happen. If you compile with different compiler, different compiler flags, run the program again, etc. you can get entirely different behavior. – Dietmar Kühl Nov 05 '14 at 02:34