40

I'm doing some maintenance work and ran across something like the following:

std::string s;
s.resize( strLength );  
// strLength is a size_t with the length of a C string in it. 

memcpy( &s[0], str, strLength );

I know using &s[0] would be safe if it was a std::vector, but is this a safe use of std::string?

John Dibling
  • 99,718
  • 31
  • 186
  • 324
oz10
  • 153,307
  • 27
  • 93
  • 128
  • 3
    The use of &s[0] is OK, memcpy() arguably less so. Why not simply do an assignment, or use the string's assign() member function? –  Dec 31 '09 at 20:26
  • 1
    @Neil Butterworth, that is what I'm asking myself while looking at this code... ;) – oz10 Dec 31 '09 at 20:29
  • As you gain experience programming in C++, you will refrain more and more from using `memset` and `memcpy`, and learn the reasoning. This is one to add to your experience. – Thomas Matthews Dec 31 '09 at 21:30

6 Answers6

49

A std::string's allocation is not guaranteed to be contiguous under the C++98/03 standard, but C++11 forces it to be. In practice, neither I nor Herb Sutter know of an implementation that does not use contiguous storage.

Notice that the &s[0] thing is always guaranteed to work by the C++11 standard, even in the 0-length string case. It would not be guaranteed if you did str.begin() or &*str.begin(), but for &s[0] the standard defines operator[] as:

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified

Continuing on, data() is defined as:

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

(notice the square brackets at both ends of the range)


Notice: pre-standardization C++0x did not guarantee &s[0] to work with zero-length strings (actually, it was explicitly undefined behavior), and an older revision of this answer explained this; this has been fixed in later standard drafts, so the answer has been updated accordingly.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
Todd Gardner
  • 13,313
  • 39
  • 51
  • I have not been following the standard for the last few months, but it was my impression was this was still in the 0x draft, and therefor not actually yet required (or will be if a library chooses to only implemented '03). – Todd Gardner Dec 31 '09 at 20:37
  • 3
    Sutter says in a comment to that post, "current ISO C++ does require &str[0] to cough up a pointer to contiguous string data (but not necessarily null-terminated!)," which would in fact make the OP's usage correct. However, I can't find anything that says that in the standard (at least it's not in 21.3.4 lib.string.access). – James McNellis Dec 31 '09 at 20:37
  • I think that might be right; the std defect 530 says operator[] is contiguous but the iterator interface is not guaranteed to be, and quotes 23.4.4. I am digging out my standard to check. – Todd Gardner Dec 31 '09 at 20:47
  • I skipped right over the defect link in Sutter's post, that's why I missed it. In any case, the defect says "we almost require contiguity already," (key word: almost) and I don't see how its reference to multiset is relevant (basic_string is a sequence with random access iterators). However, I think the important thing to take away is that "given the existence of data(), and the definition of operator[] and at in terms of data, I don't believe it's possible to write a useful and standard- conforming basic_string that isn't contiguous." – James McNellis Dec 31 '09 at 21:11
  • Todd, did you mean to say that you (and Sutter) *do not* know of an implementation? If you *do* know of an implementation, could you name it for us? – Rob Kennedy Dec 31 '09 at 21:13
  • @Rob - fixed, thanks. @James - Yeah, I think I understand where they are going with it (though I do not understand the 23.4.4 ref). (strike a part about interesting edge cases, I was wrong about that) – Todd Gardner Dec 31 '09 at 21:30
  • 5
    James: the almost is because the null for `s[s.length()]` does not have to be contiguous. `&s[n] + 1 == &s[n + 1]` must be true for all n where `0 <= n < s.length() - 1`. The requirement is buried in 21.3.4/1 that `s[n]` must return the same object as `s.data()[n]` (for n < length()), and data() must be contiguous. –  Dec 31 '09 at 21:39
  • The information about zero-length strings is incorrect; the last C++11 draft actually says "Returns: `*(begin() + pos)` if `pos < size()`, otherwise a reference to an object of type `T` with value `charT()`; the referenced value shall not be modified.". So, it's always safe to do `&str[0]`. – Matteo Italia Apr 29 '16 at 11:35
  • Since OP doesn't seem to be active on SO anymore, I updated the answer myself, it would be bad to have a top-voted, accepted question to a common question spreading obsolete information. – Matteo Italia Apr 30 '16 at 12:41
9

It is safe to use. I think most answers were correct once, but the standard changed. Quoting from C++11 standard, basic_string general requirements [string.require], 21.4.1.5, says:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

A bit before that, it says that all iterators are random access iterators. Both bits support the usage of your question. (Additionally, Stroustrup apparently uses it in his newest book ;) )

It's not unlikely that this change was done in C++11. I seem to remember that the same guarantee was added then for vector, which also got the very useful data() pointer with that release.

Hope that helps.

sebkraemer
  • 435
  • 3
  • 12
  • 2
    The question was pre-c++11 (it is tagged as such). You are correct, c++11 made it officially safe to do this. – oz10 Jan 21 '15 at 20:08
7

Technically, no, since std::string is not required to store its contents contiguously in memory.

However, in almost all implementations (every implementation of which I am aware), the contents are stored contiguously and this would "work."

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • Can you identify some implementations where it wouldn't work? – Rob Kennedy Dec 31 '09 at 20:25
  • 2
    Nope. But you could make such an implementation if you wanted. – James McNellis Dec 31 '09 at 20:25
  • @Neil: Do you have a link/reference to that TC? – James McNellis Dec 31 '09 at 20:35
  • Aargh - sorry, brain going - I'm thinking of vector, not string. Apologies all round. –  Dec 31 '09 at 20:39
  • No problem. I'm still curious as to what Sutter is talking about regarding `&str[0]`, though (cf. my comment to Todd's answer). – James McNellis Dec 31 '09 at 20:42
  • @JamesMcNellis: I don't understand how `string` is not required to store its contents contiguously in memory (i.e. before C++11, where it was required). Wouldn't `data` and `c_str` be *impossible* to implement in constant time if strings were discontiguous? – user541686 Jan 06 '13 at 03:42
  • @Mehrdad: There is no requirement in C++03 that `data` or `c_str` have constant time complexity. Further, C++03 §21.3/5 states that both `data` and `c_str` may invalidate iterators, references, and pointers to elements in the sequence. – James McNellis Jan 06 '13 at 07:31
3

Readers should note that this question was asked in 2009, when the C++03 Standard was the current publication. This answer is based on that version of the Standard, in which std::strings are not guaranteed to utilize contiguous storage. Since this question was not asked in the context of a particular platform (like gcc), I make no assumptions about OP's platform -- in particular, weather or not it utilized contigious storage for the string.

Legal? Maybe, maybe not. Safe? Probably, but maybe not. Good code? Well, let's not go there...

Why not just do:

std::string s = str;

...or:

std::string s(str);

...or:

std::string s;
std::copy( &str[0], &str[strLen], std::back_inserter(s));

...or:

std::string s;
s.assign( str, strLen );

?

John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • 1
    `std::string s (str, strLen);` (Shortest form identical, in case of embedded nulls or lacking null termination, to the original behavior from the question.) –  Dec 31 '09 at 21:44
  • @Downvoter: Note that this question was asked in 2009, and pertains to the C++03 standard. If you're downvoting either because you challenge the technical accuracy of my answer or for some other reason, I'd appreciate feedback. – John Dibling Feb 18 '14 at 19:09
2

This is generally not safe, regardless of whether the internal string sequence is stored in memory continuously or not. There's might be many other implementation details related to how the controlled sequence is stored by std::string object, besides the continuity.

A real practical problem with that might be the following. The controlled sequence of std::string is not required to be stored as a zero-terminated string. However, in practice, many (most?) implementations choose to oversize the internal buffer by 1 and store the sequence as a zero-terminated string anyway because it simplifies the implementation of c_str() method: just return a pointer to the internal buffer and you are done.

The code you quoted in your question does not make any effort to zero-terminate the data is copied into the internal buffer. Quite possibly it simply doesn't know whether zero-termination is necessary for this implementation of std::string. Quite possibly it relies on the internal buffer being filled with zeros after the call to resize, so the extra character allocated for the zero-terminator by the implementation is conveniently pre-set to zero. All this is an implementation detail, meaning that this technique depends on some rather fragile assumptions.

In other words, in some implementations, you'd probably have to use strcpy, not memcpy to force the data into the controlled sequence like that. While in some other implementations you'd have to use memcpy and not strcpy.

Yoon5oo
  • 496
  • 5
  • 11
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • 1
    After the call to `resize` you can be quite sure that the internal string is or isn't null-terminated as the implementation requires. After a call to `resize` after all you must have a valid string of n characters (padded with zero characters as needed). - However, it shows a lack of understanding for the `std::string` class: memcpy is used either out of ignorance or as a misguided attempt for performance (because of the `resize` call the code ends up assigning values to the buffer twice). – UncleBens Dec 31 '09 at 23:23
  • @UncleBens: I don't understand your first sentence. In any case, yes, the language standard guarantees that the size-increasing `resize` call pads the string with zeros. However, the standard guarantees the padding only up to the requested size (`strLength` in this case), but there's no guarantee in the standard for that extra character, if the implementation allocates one. – AnT stands with Russia Dec 31 '09 at 23:28
  • From C++11 onward, when the string is not empty, the internal buffer is *required* to be null-terminated, because both `data()` and `c_str()` are *required* to return the same buffer, and `c_str()` is *required* to always return a pointer to a null-terminated buffer (`data()` is allowed to return `nullptr` when empty). Prior to C++11, the internal buffer was not *required* to be null-terminated (or even contiguous), but most implementations were because it simplified the implementation of `c_str()` – Remy Lebeau Aug 30 '19 at 22:56
0

The code might work, but more by luck than judgement, it makes assumptions about the implementation that are not guaranteed. I suggest determining the validity of the code is irrelevant while it is a pointless over complication that is easily reduced to just:

std::string s( str ) ;

or if assigning to an existing std::string object, just:

s = str ;

and then let std::string itself determine how to achieve the result. If you are going to resort to this sort of nonsense, then you may as well not be using std::string and stick to since you are reintroducing all the dangers associated with C strings.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • I actually can't be sure the string being assigned is null terminated. So the best I could do will probably be s.assign( ptr, ptrLength); which is still an improvement I think. – oz10 Dec 31 '09 at 21:02
  • Use the constructor form: `std::string s (str, strLen);` – GManNickG Dec 31 '09 at 22:04