53

In a 2008 post on his site, Herb Sutter states the following:

There is an active proposal to tighten this up further in C++0x and require null-termination and possibly ban copy-on-write implementations, for concurrency-related reasons. Here’s the paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2534.html . I think that one or both of the proposals in this paper is likely to be adopted, but we’ll see at the next meeting or two.

I know that C++11 now guarantees that the std::string contents get stored contiguously, but did they adopt the above in the final draft?

Will it now be safe to use something like &str[0]?

Xeo
  • 129,499
  • 52
  • 291
  • 397
links77
  • 779
  • 2
  • 7
  • 8

3 Answers3

58

Yes. Per the C++0x FDIS 21.4.7.1/1, std::basic_string::c_str() must return

a pointer p such that p + i == &operator[](i) for each i in [0,size()].

This means that given a string s, the pointer returned by s.c_str() must be the same as the address of the initial character in the string (&s[0]).

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • 6
    Note that the same requirement holds true for `data`, which I believe wasn't true for C++98/03. – Xeo May 20 '11 at 20:32
  • 10
    Yes, it's illuminating that `basic_string<>::c_str()` and `basic_string<>::data()` now have exactly identical semantics. – ildjarn May 20 '11 at 20:35
  • 19
    This doesn't appear to answer the question with which the post is titled - ie "Will `std::string` always be null-terminated in C++11?", in which case the answer is no. `operator[str.length()]` will return `'\0'`, but that doesn't mean that the `string` actually contains it in memory. – Andrew Marshall Jun 24 '13 at 16:21
  • I read this as well in the final C++11 spec (21.4.7.1/1), but I don't see how any requirements are placed on the element at `operator[str.length()]`, other than it must be valid a referenceable. – John Dibling Jul 18 '13 at 13:16
  • 25
    @AndrewMarshall: `operator[]` is required to return a reference to the actual stored element, so (21.4.7.1/1) also applies the requirement that the element at `operator[str.length()]` must be part of the storage. – John Dibling Jul 18 '13 at 13:18
  • Shouldn't it be `[0,size())`? – S.S. Anne Jan 09 '20 at 22:20
  • 1
    @S.S.Anne No, in *this* case, the terminator is part of the sequence. Not that it is always part of the sequence, look e.g. `.at()`. – Deduplicator Apr 22 '21 at 23:48
-1

&str[0] is safe to use -- so long as you do not assume it points to a null-terminated string.

Since C++11 the requirements include (section [string.accessors]):

  • str.data() and str.c_str() point to a null-terminated string.
  • &str[i] == str.data() + i , for 0 <= i <= str.size()
    • note that this implies the storage is contiguous.

However, there is no requirement that &str[0] + str.size() points to a null terminator.

A conforming implementation must place the null terminator contiguously in storage when data(), c_str() or operator[](str.size()) are called; but there is no requirement to place it in any other situation, such as calls to operator[] with other arguments.


To save you on reading the long chat discussion below: The objection was been raised that if c_str() were to write a null terminator, it would cause a data race under res.on.data.races#3 ; and I disagreed that it would be a data race .

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 1
    The `constexpr const CharT* data() const noexcept;` overload can't modify anything, so it has to be there from the start – Caleth Jun 11 '21 at 13:21
  • @Caleth The text you quote was added in C++20 – M.M Jun 11 '21 at 13:24
  • @M.M it's been a `const` member function and had an O(1) requirement since at least C++11 if not longer. De-facto it had to be zero terminated internally. Edit: [yes it was `const` prior](https://en.cppreference.com/w/cpp/string/basic_string/c_str) – Mgetz Jun 11 '21 at 13:24
  • @Mgetz placing a null terminator is O(1) since the length is known. A `const` member function is allowed to modify mutable internal storage of an object; and any dynamically allocated storage that the object holds an internal pointer to – M.M Jun 11 '21 at 13:27
  • It wasn't `constexpr` prior to C++20, but the requirement still stands – Caleth Jun 11 '21 at 13:28
  • @M.M modifying the internal buffer would linguistically invalidate iterators, something that method and in fact all `const` accessors are explicitly prohibited by the standard from doing. – Mgetz Jun 11 '21 at 13:30
  • @Mgetz No it wouldn't . Iterator invalidation applies to calls the user makes to member functions of the string , not by any internal operation the implementation makes. The implementation only has to provide the guarantees that the standard places on the observable behaviour – M.M Jun 11 '21 at 13:32
  • 1
    *If* it were allowed to modify the buffer, it would have to do it in a way where there was no possibility of a data race, which I don't think is possible without a per-string mutex or similar – Caleth Jun 11 '21 at 13:32
  • @M.M [21.3.3.8.1.3](https://eel.is/c++draft/string.accessors#3) "Remarks: The program shall not modify any of the values stored in the character array; otherwise, the behavior is undefined." that includes the accessor method. Library calls are part of the program ;) – Mgetz Jun 11 '21 at 13:34
  • @Caleth [look, no errors](https://gcc.godbolt.org/z/j6cG86q6f) . (Reiterate my point that a `const` member function may modify dynamically allocated storage) – M.M Jun 11 '21 at 13:38
  • @Mgetz The implementation is not part of the program – M.M Jun 11 '21 at 13:41
  • "Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container." [container.requirements.general#11](https://timsong-cpp.github.io/cppwp/n3337/container.requirements#general-11) – Caleth Jun 11 '21 at 13:45
  • @Caleth I'm not aware of the standard providing any guarantee about thread-safety of any std::string operation (or any other standard container unless explicitly mentioned), in general any standard library object might contain or point to storage that can be modified by a `const` member function . If you can point to something in the standard that talks about thread safety of `std::string` then go ahead – M.M Jun 11 '21 at 13:46
  • "A C++ standard library function shall not directly or indirectly modify objects ([intro.multithread]) accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-const arguments, including this." [res.on.data.races#3](https://timsong-cpp.github.io/cppwp/n3337/res.on.data.races#3) – Caleth Jun 11 '21 at 13:48
  • @Caleth But it does explicitly specify that `data()` shall provide a null terminator. Also in your quote "object in the container" refers to the object being contained; not to any internal detail of the container – M.M Jun 11 '21 at 13:49
  • The `char[]` that `std::string` owns includes the nul terminator, that is the `char` that is 0 is an object within that container – Caleth Jun 11 '21 at 13:49
  • @Caleth I don't agree, if a container's `size()` is 4 then it contains 4 elements . There can be various amounts of bookkeeping and other storage used by the container, but this clause is referring to the conceptual contents of the container, not the bookkeeping – M.M Jun 11 '21 at 13:52
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/233658/discussion-between-caleth-and-m-m). – Caleth Jun 11 '21 at 13:52
-3

Although c_str() returns a null terminated version of the std::string, surprises may await when mixing C++ std::string with C char* strings.

Null characters may end up within a C++ std::string, which can lead to subtle bugs as C functions will see a shorter string.

Buggy code may overwrite the null terminator. This results in undefined behaviour. C functions would then read beyond the string buffer, potentially causing a crash.

#include <string>
#include <iostream>
#include <cstdio>
#include <cstring>

int main()
{
    std::string embedded_null = "hello\n";
    embedded_null += '\0';
    embedded_null += "world\n";

    // C string functions finish early at embedded \0
    std::cout << "C++ size: " << embedded_null.size() 
              << " value: " << embedded_null;
    printf("C strlen: %d value: %s\n", 
           strlen(embedded_null.c_str()), 
           embedded_null.c_str());

    std::string missing_terminator(3, 'n');
    missing_terminator[3] = 'a'; // BUG: Undefined behaviour

    // C string functions read beyond buffer and may crash
    std::cout << "C++ size: " << missing_terminator.size() 
              << " value: " << missing_terminator << '\n';
    printf("C strlen: %d value: %s\n", 
           strlen(missing_terminator.c_str()), 
           missing_terminator.c_str());
}

Output:

$ c++ example.cpp
$ ./a.out
C++ size: 13 value: hello
world
C strlen: 6 value: hello

C++ size: 3 value: nnn
C strlen: 6 value: nnna�
PFee
  • 249
  • 2
  • 10
  • 5
    "*`missing_terminator[3] = 'a';`*" That's explicitly UB. You can read from the NUL terminator, but [you *cannot* write to it](https://timsong-cpp.github.io/cppwp/n4659/string.access#2). Well, you can't write any value other than NUL to it. – Nicol Bolas Apr 22 '21 at 22:21
  • 2
    I wouldn't say "c_str() *generally* returns", since C++11 [it](https://en.cppreference.com/w/cpp/string/basic_string/c_str) "*returns* a pointer to a null-terminated character array with data equivalent to those stored in the string.". – Bob__ Apr 22 '21 at 22:36
  • Replacing the null-terminator with another character is UB. However is an embedded null _allowed_? Both lead to problems, neither is caught by GCC or Clang. – PFee Apr 22 '21 at 23:28
  • 2
    Yes they are [allowed](https://stackoverflow.com/questions/2845769/can-a-stdstring-contain-embedded-nulls). – Bob__ Apr 23 '21 at 00:09