24

I know that in C++98, neither std::basic_string<> nor std::vector<> were required to use contiguous storage. This was seen as an oversight for std::vector<> as soon as it was pointed out, and, if I remember correctly, got fixed with C++03.

I seem to remember having read about discussions requiring std::basic_string<> to use contiguous storage back when C++11 was still called C++0x, but I haven't followed the discussion closely back then, and am still restricted to C++03 at work, so I am not sure what became of it.

So is std::basic_string<> required to use contiguous storage? (If so, then which version of the standard required it first?)

In case you wonder: This is important if you have code passing the result of &str[0] to a function expecting a contiguous piece of memory to write to. (I know about str.data(), but for obvious reasons old code doesn't use it.)

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
sbi
  • 219,715
  • 46
  • 258
  • 445
  • Well, since `str.data()` needs to be constant time and is required to return a continuous memory block, I would conclude that yes. – Šimon Tóth Oct 14 '15 at 11:52
  • 1
    See similar questions http://stackoverflow.com/questions/1986966/does-s0-point-to-contiguous-characters-in-a-stdstring and http://stackoverflow.com/questions/2256160/is-it-reasonable-to-use-stdbasic-stringt-as-a-contiguous-buffer-when-targeti, which answers the question but in a way that is not trivial to read. – slaphappy Oct 14 '15 at 11:54
  • What are those obvious reasons for not using `data()`? That function was present in C++03 as well. – Angew is no longer proud of SO Oct 14 '15 at 11:57
  • 1
    @Mr.kbok: I had seen them both, read the answer, and thought I'd rather have an up-to-date definitive answer. – sbi Oct 14 '15 at 12:00
  • I am quite sure that in C++03 the answer is no, although virtually all the implementations used contiguous memory. I'll try to find a reference for this. – sbabbi Oct 14 '15 at 12:00
  • 1
    @Angew: In C++03, `str.data()` returned a `const char*`. – sbi Oct 14 '15 at 12:01
  • @sbi Oh, right. I was confused by the fact that C++03 defined `[]` in terms of `data()`, even for the non-const overload! – Angew is no longer proud of SO Oct 14 '15 at 12:13
  • 1
    FWIW, I've never seen or heard of a discontiguous `std::basic_string` implementation; anyone interested in that would likely have added it as a distinct [STL `rope`](http://www.sgi.com/tech/stl/Rope.html) lookalike. So, in practice if I were you I'd just peek at your work C++03's Standard Libraries (if paranoid) then code on the assumption it'll always be contiguous. – Tony Delroy Oct 14 '15 at 12:16
  • @sbi That doesn't really change the fact that even in C++03 it needs to be constant time and continuous. – Šimon Tóth Oct 14 '15 at 12:20
  • @sbi "In C++03, `str.data()` returned a `const char*`" Has that changed in C++11? Feel like I'm missing something... – atkins Oct 14 '15 at 13:16
  • @Let_Me_Be C++03 allowed for implementation to use two separate storages, one for `operator[]`and one for `c_str` (no idea why would any implementor do that). This still meets the constant time requirement of `c_str`. If you take a look at the page on [cppreference](http://en.cppreference.com/w/cpp/string/basic_string/c_str) you can see that `data() + i` was not required to be equal to `&operator[](i)` in 03. – sbabbi Oct 14 '15 at 13:20

3 Answers3

29

The C++11 standard, basic_string 21.4.1.5,

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
Rahul Tripathi
  • 168,305
  • 31
  • 280
  • 331
15

In c++03 there was no guarantee that that the elements of the string are stored continiously. [basic.string] was

  1. For a char-like type charT, the class template basic_string describes objects that can store a sequence consisting of a varying number of arbitrary char-like objects (clause 21). The first element of the sequence is at position zero. Such a sequence is also called a “string” if the given char-like type is clear from context. In the rest of this clause, charT denotes such a given char-like type. Storage for the string is allocated and freed as necessary by the member functions of class basic_string, via the Allocator class passed as template parameter. Allocator::value_type shall be the same as charT.
  2. The class template basic_string conforms to the requirements of a Sequence, as specified in (23.1.1). Additionally, because the iterators supported by basic_string are random access iterators (24.1.5), basic_string conforms to the the requirements of a Reversible Container, as specified in (23.1). 389 ISO/IEC 14882:2003(E)  ISO/IEC 21.3 Class template basic_string 21 Strings library
  3. In all cases, size() <= capacity().

And then in C++17 they changed it too

  1. The class template basic_string describes objects that can store a sequence consisting of a varying number of arbitrary char-like objects with the first element of the sequence at position zero. Such a sequence is also called a “string” if the type of the char-like objects that it holds is clear from context. In the rest of this Clause, the type of the char-like objects held in a basic_string object is designated by charT.
  2. The member functions of basic_string use an object of the Allocator class passed as a template parameter to allocate and free storage for the contained char-like objects.233
  3. A basic_string is a contiguous container (23.2.1).
  4. In all cases, size() <= capacity().

emphasis mine

So pre C++17 it was not guaranteed but now it is.

With the constraints that std::string::data imposes this non guarantee is almost moot as calling std::string::data gives you a continuous array of the characters in the string. So unless the implementation is doing this on demand and in constant time the string will be continuous.


In case you wonder: This is important if you have code passing the result of &str[0] to a function expecting a contiguous piece of memory to write to. (I know about str.data(), but for obvious reasons old code doesn't use it.)

The behavior of operator[] has changed as well. In C++03 we had

Returns: If pos < size(), returns data()[pos]. Otherwise, if pos == size(), the const version returns charT(). Otherwise, the behavior is undefined.

So only the const version was guaranteed to have defined behavior if you tried &s[0] when s is empty. In C++11 they changed it to:

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

So now both the const and non const versions have defined behavior if you tried &s[0] when s is empty.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • 3
    There is no way to have `string::data()` to be constant time function that returns a continuous sequence without the actual storage being continuous, even in C++03. – Šimon Tóth Oct 14 '15 at 12:29
  • @Let_Me_Be Yes but if the standard doesn't impose the requirement then you can't be certain this will happen 100% of the time. I am envisioning a deque like structure. – NathanOliver Oct 14 '15 at 12:39
  • The standard imposes the requirement of `string::data()` containing the same content as the content of the string and it imposes the constant time requirement on `string::data()`. There is no way to have a deque like structure and maintain the constant time restriction. – Šimon Tóth Oct 14 '15 at 12:47
  • @Let_Me_Be I added the caveat about `string::data()` into the answer. – NathanOliver Oct 14 '15 at 13:26
  • Thanks for expanding on Rahul's answer. As an occasional visitor to C++, I have to work around popular generalisations: one of them is that pointer-based languages offer no guarantees about the physical layout of any nontrivial data structure: all that you know is that the data in the structure will transparently accessible from its pointer. *Transparently*, not 'sequentially by performing arithmetic on addresses'. Hence the earlier specification of 'conforms to... a Reversible Container' rather than 'contiguous' - And you can be sure that assuming a contiguous block has bitten someone once. – Nigel Heffernan Oct 14 '15 at 13:58
  • @Nathan: In the last decade I kept running into code where people passed `&s[0]` (or `&*str.begin()`) into functions requiring a pointer to the first element of a writable C char array. Usually I have left such code alone (assuming, of course, that they made sure the string isn't empty), since there never was any known implementation where string didn't have non-contiguous memory, and, as I said, I seemed to recall the intent to make this a requirement anyway. Nevertheless, I now wanted to know whether this change has finally been done. Thanks! – sbi Oct 14 '15 at 14:31
  • "Contiguous container" is not C++11. That's C++17, added by [N4284](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4284). (The other C++11 requirements do require contiguous underlying storage.) – T.C. Dec 04 '15 at 23:15
2

According to the draft standard N4527 21.4/3 Class template basic_string [basic.string] :

A basic_string is a contiguous container (23.2.1).

101010
  • 41,839
  • 11
  • 94
  • 168