6

What actually is done when string::c_str() is invoked?

  1. string::c_str() will allocate memory, copy the internal data of the string object and append a null-terminated character to the newly allocated memory?

or

  1. Since string::c_str() must be O(1), so allocating memory and copying the string over is no longer allowed. In practice having the null-terminator there all the time is the only sane implementation.

Somebody in the comments of this answer of this question says that C++11 requires that std::string allocate an extra char for a trailing '\0'. So it seems the second option is possible.

And another person says that std::string operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string to a function expecting a zero terminated string, it can be omitted.

And more voice from an expert:

Why is it common for implementers to make .data() and .c_str() do the same thing?

Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.

So I am really confused now, what actually is done when string::c_str() is invoked?

Update:

If c_str() is implemented as simply returning the pointer it's already allocated and managed.

A. Since c_str() must be null-terminated, the internal buffer needs to be always be null-terminated, even if for an empty std::string, e.g: std::string demo_str;, there should be a \0 in the internal memory of demo_str. Am I right?

B.What would happen when std::string::substr() is invoked? Automactically append a \0 to sub-string?

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
John
  • 2,963
  • 11
  • 33
  • 1
    @John - You're overthinking it. It's simply returning the pointer it's already allocated and managing. Of course, every C++ implementation is free to do what it wants here - but it's really hard to see it working any other way. My advice is to set a breakpoint in the debugger and inspect the internals of a `std::string` . Go deep enough and you'll see that somewhere within std::basic_string there's a `char*` member as well as a `size` member. – selbie Sep 25 '21 at 04:42
  • What is done? Something that works. Anything more specific is necessarily implementation-dependent. – n. m. could be an AI Sep 25 '21 at 05:17
  • Nothing much, std::string has its own internally allocated buffer, and c_str just returns a constant char* to that buffer (so you can read from it but not modify it) – Pepijn Kramer Sep 25 '21 at 05:17
  • @selbie If `c_str()` is implemented as simply returning the pointer it's already allocated and managed. ***1.*** Since `c_str()` must be null-terminated, the internal buffer needs to be always be null-terminated, even if for an empty `std::string`, e.g: `std::string demo_str;`, there should be a `\0` in the internal memory of `demo_str`. Am I right? ***2***:What would happen when `std::string::sub_str()` is invoked? Automactically append a `\0` to sub-string? – John Sep 25 '21 at 05:51
  • @John Yes and yes. This also applies to any other string-operation like concatenation oder removing elements. A not-broken implementation will make sure that the internal buffer is always null-terminated. As stated by others, that's the best, if not only, way to satisfy the requirements of `std::string::c_str()`. – Lukas-T Sep 25 '21 at 06:57
  • @John - there's no such method a `sub_str`. I assume you mean `substr`. In any case, I've left you with a sample implementation of how it could work. See answer below. – selbie Sep 25 '21 at 08:28

4 Answers4

11

Since C++11, std::string::c_str() and std::string::data() are both required to return a pointer to the string's internal buffer. And since c_str() (but not data()) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()/length(), or returned by std::string iterators, etc.

Prior to C++11, the behavior of c_str() was technically implementation-specific, but most implementations I've ever seen worked this way, as it is the simplest and sanest way to implement it. C++11 just standardized the behavior that was already in wide use.

UPDATE

Since C++11, the buffer is always null-terminated, even for an empty string. However, that does not mean the buffer is required to be dynamically allocated when the string is empty. It could point to an SSO buffer, or even to a single static nul character. There is no guarantee that the pointer returned by c_str()/data() remains pointing at the same memory address as the content of the string changes.

std::string::substr() returns a new std::string with its own null-terminated buffer. The string being copied from is unaffected.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
  • Thank you for the clarification. Since c_str() must be null-terminated, two more questions arise, please see the update of the question. – John Sep 25 '21 at 06:06
  • 1
    Although they didn't allocate and copy the buffer, libstdc++ did write the null terminator only when calling c_str, so it didn't work the same way c++11 would require. Back then, there was basically two major standard library implementations, so I'm not sure your "most implementations" is quite precise. – eerorika Sep 25 '21 at 09:37
  • 1
    Minor nit: it wasn't implementation defined; it was implementation specific. In standardese, "implementation defined" means that the implementation must document its behavior. – Pete Becker Sep 25 '21 at 13:45
  • @PeteBecker noted, thanks. – Remy Lebeau Sep 25 '21 at 17:06
  • @RemyLebeau "There is no guarantee that the pointer returned by c_str()/data() remains pointing at the same memory address ***as the content of the string changes.***". Why `c_str()/data()` could point to different memory address when the content of the string changes? Could you please explain that in more detail for me? – John Sep 26 '21 at 01:35
  • @John for instance, when adding/removing characters, the string may switch between an SSO (short string optimization) buffer and a dynamic buffer, or have to grow a dynamic buffer to a new capacity, etc. So that changes the memory address that the `c_str()`/`data()` pointer is pointing at. Prior to C++11, implementations were also allowed to implement copy-on-write semantics (that is no longer allowed in C++11 onward), which could have also allocated a new buffer, too. – Remy Lebeau Sep 26 '21 at 01:43
  • @RemyLebeau If I understand you correctly, for C++11 onward, the pointer returned by `c_str()` should always be equivalent to `data()` (if you call them at the same time or the `string` has not been modified when you called them one by one). But the pointer returned by `c_str()\data()` could be different from the last time when they were called since the `string` may be modified(i.e. switch between an SSO (short string optimization) buffer and a dynamic buffer, or have to grow a dynamic buffer to a new capacity). Am I right? – John Sep 26 '21 at 02:01
  • @John "*for C++11 onward, the pointer returned by `c_str()` should always be equivalent to `data()`*" - not equivalent, but identical. They are required to return the **same pointer** if the string has not been modified in between the calls. "*But the pointer returned by c_str()\data() could be different from the last time when they were called since the string may be modified*" - if the string is modified, yes. – Remy Lebeau Sep 26 '21 at 02:21
1

Here is an empirical "proof" that the complexity of .c_str() is o(1):

#include <stdio.h>
#include <string>
using namespace std;
int main(int argc, char **argv)
{
    std::string x(5000000, 'b'); // <--- single time allocation
    // std::string x(5, 'b'); // <--- compare to a much shorter string
    for (unsigned int i=0;i<1000000;i++)
    {
        const char *y = x.c_str(); // <--- copy entire content ?
    }
}
  • compiled with -O0 to avoid optimizing out anything
  • timing 2 versions: I get identical performance
  • this is an empirical "proof" that (at least my machine's implementation)
    • extracts the internal representation of a null terminated string
    • doesn't copy content every time .c_str() is called.
OrenIshShalom
  • 5,974
  • 9
  • 37
  • 87
  • "extracts the internal representation of a null terminated string"? What that?You mean there is a `\0` when the `string` is constructed. Am I right? – John Sep 25 '21 at 06:10
  • @John yes, the experiment supports the claim that a null terminated (ends with `\0`) string exists internally, and `.c_str()` does not copy any memory but just points to it. – OrenIshShalom Sep 25 '21 at 06:41
  • The experiment supports the claim that a null terminated (ends with \0) string exists internally? How? Could you explain that in more detail? – John Sep 25 '21 at 12:47
1

There's a lot of great answers and comments already provided. But to demonstrate that std::string is typically backed by a null terminated string, I've provided a simple, yet naive implementation. It's not complete, doesn't do error checking, and is certainly not optimized. But it's complete enough to show you how a string class is typically implemented with a null terminated buffer as a member variable.

class string
{
public:

    string()
    {
        assign("", 0);
    }

    string(const char* s)
    {
        assign(s, strlen(s));
    }

    string(const char* s, size_t len)
    {
        assign(s, len);
    }

    string(const string& s)
    {
        assign(s._ptr, s._len);
    }

    ~string()
    {
       delete [] _ptr;
    }

    string& operator=(const string& s)
    {
        const char* oldptr = _ptr;
        assign(s._ptr, s._len);
        delete [] oldptr;
    }

    const char* data()
    {
        return _ptr;
    }

    const char* c_str()
    {
       return _ptr;
    }

    size_t length()
    {
        return _len;
    }

    // substr always returns a new string
    std::string substr(size_t pos, size_t count)
    {
        std::string s(_ptr+pos, count);
        return s;  
    }

private:
    char* _ptr;
    size_t _len;

    void assign(const char* ptr, size_t len)
    {
        _len = len;        
        _ptr = new char[_len+1]; // +1 for null termination
        memcpy(_ptr, ptr, len); 
        _ptr[_len] = '\0';       // always null terminate
    }
};
selbie
  • 100,020
  • 15
  • 103
  • 173
0

What would happen when std::string::substr() is invoked? Automactically append a \0 to sub-string?

Yes because std::string holds modifiable values. If you want something like std::string::substr() that does not allocate the respective content, C++17 has std::string_view for this purpose:

std::string returns_str();

auto v = std::string_view{returns_str()}.substr(3, 10);
Quirin F. Schroll
  • 1,302
  • 1
  • 11
  • 25