26

I recently found the need to replace a std::string's contents with a substring of itself. The most logical function to call here I think is the following, from http://www.cplusplus.com/reference/string/string/assign/:

substring (2)      string& assign (const string& str, size_t subpos, size_t sublen);
    
     Copies the portion of str that begins at the character position subpos and spans sublen characters (or until the end of str, if either str is too short or if sublen is string::npos).

     str
         Another string object, whose value is either copied or moved.

     subpos
         Position of the first character in str that is copied to the object as a substring. If this is greater than str's length, it throws out_of_range. Note: The first character in str is denoted by a value of 0 (not 1).

     sublen
         Length of the substring to be copied (if the string is shorter, as many characters as possible are copied). A value of string::npos indicates all characters until the end of str.

However, I'm not certain if this is permissible, or if it can corrupt the string data. I know that memcpy(), for example, does not allow (or at least does not guarantee non-corruption in the case of) overwriting an area of memory with (a portion of) itself (see memcpy() vs memmove()). But I don't know if the above method has the same limitation.

More generally, can you please comment if I should have been able to figure out the answer to this question myself? There's nothing in the documentation I linked to that makes it clear to me what the answer to this question is, except perhaps the qualifier "Another" in the description of the str parameter ("Another string object"), which seems to imply it cannot be the this object, although I don't find that to be unequivocal. Is that a weakness in the documentation?

Community
  • 1
  • 1
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • 2
    Assignable C++ classes often implement the copy assignment operator in a safe way (i. e. they check for assigning `*this = *this`). Standard container classes are no exception. **However,** even this is not required – a substring of a string is **not** the string itself, anymore. The "Cplusplus.com" site, again, seems to have poor wording – [cppreference.com](http://en.cppreference.com/w/cpp/string/basic_string/assign) uses the word "replaces" instead, from which it's quite clear that what you're doing should be safe. – The Paramagnetic Croissant Jan 25 '15 at 22:01
  • 4
    @TheParamagneticCroissant, why does the word "replaces" make it clear that it should be safe? By the same token, you could say that `memcpy()` replaces the contents of the destination buffer, but that doesn't mean it's safe. – bgoldst Jan 25 '15 at 22:03
  • Also, what do you mean the check is not required because a substring of a string is not the string itself anymore? The way this method is called, you pass a (const) reference to the source `std::string` object. So isn't the method working with the source object itself, not a copy/temporary? It _is_ the string itself, so in my mind that introduces a danger that it could modify itself before fully reading itself, causing corruption, or possibly even a crash. – bgoldst Jan 25 '15 at 22:10
  • 5
    @bgoldst I am fairly sure that this scenario is not fully covered by the standard, making it dogdy to work with. If you want to stay safe, use `substr` and the assignment operator (i.e. work with a copy). – Columbo Jan 25 '15 at 22:11
  • @TheParamagneticCroissant Interesting that you're blaming one site for poor documentation, when your "evidence" is that the other site makes clear that this operation is safe. Yet the C++ standard itself says no such thing. I don't think this is "safe". – Lightness Races in Orbit Jan 25 '15 at 22:27
  • Huh? There's a missing `,` in [string::append]/7 Anyone cares to report? – dyp Jan 25 '15 at 22:27
  • @dyp I was just looking at that. – The Paramagnetic Croissant Jan 25 '15 at 22:28
  • @dyp: Hah just spotted that. CBA though. – Lightness Races in Orbit Jan 25 '15 at 22:28
  • @dyp I probably saw it first but cannot be bothered. It's obvious where it belongs, so.. – Columbo Jan 25 '15 at 22:30
  • @LightnessRacesinOrbit I am not using it as "evidence". (If I did, I would have posted an answer). The only thing I am using as evidence is the C++11 standard, of which the latest working draft I am reading right now. The (abstract) implementation of this particular overload of `assign()` says that it should call another overload, in which a replacement of "the string controlled by `*this`" shall occur – which may or may not refer to the replacement of individual characters or the allocated storage itself, indeed. – The Paramagnetic Croissant Jan 25 '15 at 22:36
  • @TheParamagneticCroissant Well that was a lot of words but, regardless, there's no safety guarantee set by C++. – Lightness Races in Orbit Jan 25 '15 at 22:38
  • libc++ uses the character trait's `move` function, and `libstdc++` even has an explicit overlap check. I guess the implementers think it should be supported. – dyp Jan 25 '15 at 22:50
  • 1
    @dyp Unfortunately that doesn't suffice. Maybe this should be subject of an EWG issue. – Columbo Jan 25 '15 at 22:51
  • 1
    @Columbo I suspect it might end up just like [LWG 526](http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#526). – dyp Jan 25 '15 at 22:53
  • @Columbo: Where standards and specifications are concerned, "It's obvious where it belongs" is insufficient. – Lightness Races in Orbit Jan 25 '15 at 23:00
  • https://github.com/cplusplus/draft/issues/434 – Lightness Races in Orbit Jan 25 '15 at 23:34
  • Interestingly, it's strongly implied that `char_traits::copy` can't handle overlaps, but I see nothing that even implies that `std::basic_string` should use `char_traits::copy` or `char_traits::move` for anything at all. Only `traits::length`, `traits::eq`, and `traits::compare`. – Mooing Duck Feb 02 '15 at 21:17
  • @dyp or the implementors couldn't decipher the standard either and went with the safe option. – M.M Feb 02 '15 at 23:41

3 Answers3

4

No.

This operation is defined by [string::assign]/4:

basic_string& assign(const basic_string& str, size_type pos,
    size_type n = npos);

Effects: Determines the effective length rlen of the string to assign as the smaller of n and str.size() - pos and calls assign(str.data() + pos rlen).

(dat typo)

Then:

basic_string& assign(const charT* s, size_type n);

Effects: Replaces the string controlled by *this with a string of length n whose elements are a copy of those pointed to by s.

Nothing about this says anything about whether str.assign(str, 0) is at all safe (in particular, we have no way of knowing when the copy of each character will occur!).

Therefore I strongly suggest you avoid doing it.

animuson
  • 53,861
  • 28
  • 137
  • 147
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 4
    For `vector::push_back` etc, it has been argued that the omission of any requirement guarantees it is well-behaved (it is not allowed to fail). libstdc++ and libc++ seem to support assigning aliasing strings explicitly. – dyp Jan 25 '15 at 22:54
  • 1
    @dyp: I don't see how the omission of a requirement implicitly introduces a requirement! Well, I sort of do in that if no pre-requisite is noted then one might assume there are none. But this is all guesswork, and I'd much rather conclude that unless this is specifically addressed then it's unsafe by definition. – Lightness Races in Orbit Jan 25 '15 at 22:55
  • 1
    @dyp Was just about to say that after I read your link. I believe that the call should be fine as long as the arguments are reasonable. – Columbo Jan 25 '15 at 22:55
  • @Columbo: Whether the arguments are reasonable is the entire crux of this question. – Lightness Races in Orbit Jan 25 '15 at 22:56
  • @LightnessRacesinOrbit I rather meant that no overlappins occur. – Columbo Jan 25 '15 at 22:57
  • 1
    @Columbo, when assigning part of a string to itself it always overlaps, because you're assigning a part of a string to itself. – OmnipotentEntity Jan 25 '15 at 23:00
  • @OmnipotentEntity Assigning the last three characters of a string to that same string is overlapping when the strings size is >=6? – Columbo Jan 25 '15 at 23:01
  • 10
    I think the spec is clear. "Replaces the string ... _whose elements are a copy of those pointed to by s_" - this means that the implementation is ***required*** to make sure it has this effect. No preconditions mention overlapping inputs. Therefore it must be safe to call it with overlapping inputs, because the implementation ***must*** have the described effect – sehe Jan 26 '15 at 07:17
  • Is this still an issue today? More specifically, how safe it is to assign a `std::string_view` that contains a substring of a `std::string` to that same `std::string`? – user1593842 Mar 23 '19 at 13:38
  • 1
    @sehe Coming back to this, I think your literal interpretation is correct, but because the scenario is not specifically called out I don't trust the standard and its implementations to properly guarantee that it abides by that interpretation, so I'd probably still avoid it. In particular I can _easily_ imagine an implementation deallocating/reallocating *first* (if a reallocation is needed), missing this case. It's worth some investigation into common implementations at some point, after which I'll update the answer – Lightness Races in Orbit Mar 24 '19 at 11:45
  • @user1593842 FWIW I consider that to be the same case – Lightness Races in Orbit Mar 24 '19 at 11:45
  • 1
    @Lightness Races in Orbit thanks. And I share your fears. I would also feel safer if it was explicitly stated somewhere. Now, specially with the advent of `std::string_view`, passing substrings around has become quite common for the "OMG I must avoid unecessary allocations at all costs crowd " (which I am (proudly?) part of). So, if this is not safe, a simple string "dequoting" operation, supposing the existence of a `std::string_view Dequote(std::string_view s);` function, may have catastrophic results. – user1593842 Mar 26 '19 at 12:17
  • 1
    By the way, just checked it and I can confirm the Visual Studio 2015 implementation does check if the new string is a substring of the original and does the right (well, at least the expected) thing. – user1593842 Mar 26 '19 at 12:28
  • I'm even more concerned about this since P1148r0, which changed the quoted clause to be in terms of `string_view` instead. – Lightness Races in Orbit Dec 09 '19 at 12:00
  • For libstdc++, I got as far as digging out some code (https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/basic_string.h#L1402 https://github.com/gcc-mirror/gcc/blob/3f6823abf8d0ce23804dfbfe32c6250824501ef6/libstdc%2B%2B-v3/include/bits/basic_string.tcc#L422) but got bored. Offering a bounty instead. – Lightness Races in Orbit Dec 09 '19 at 12:05
1

Don't try that.

It may work, but as it is suggested in the selected answer, it is not ensured to be safe. In the best case, depending on your implementation, a temporary object would be created and then destroyed.

One way to emulate that, which does not create temporary objects and is indeed faster than calling to assign and substr is the following:

void trimTo(string & s, size_t pos = 0, size_t len = string::npos)
{ 
    s.erase(pos + len); 
    s.erase(pos, len); 
}

Then,

trimTo(myString, fromPos, numChars);

works as

myString.assign(myString.substr(fromPos, numChars);

but it's at least twice faster.

J C Gonzalez
  • 861
  • 10
  • 23
  • 1
    The second statement `s.erase(pos, len);` is erasing the part you want to keep. It should be `s.erase(0, pos)`, but only when `pos > 0`. The first statement `s.erase(pos + len)` may not be valid when `len` is unspecified, i.e. when `len == std::string::npos`. – MikeOnline Dec 03 '22 at 08:16
  • Yep, @MikeOnline, you're completely right! There are many conditions that should be included. – J C Gonzalez Dec 04 '22 at 13:00
0

Apparently it was the same decision as with operator= (protect against self assignment).

_Myt& assign(const _Myt& _Right,
    size_type _Roff, size_type _Count = npos)
    {   // assign _Right [_Roff, _Roff + _Count)
    _Right._Check_offset(_Roff);
    _Count = _Right._Clamp_suffix_size(_Roff, _Count);

    if (this == &_Right)
        erase((size_type)(_Roff + _Count)), erase(0, _Roff);    // substring
    else if (_Grow(_Count))
        {   // make room and assign new stuff
        _Traits::copy(this->_Myptr(),
            _Right._Myptr() + _Roff, _Count);
        _Eos(_Count);
        }
    return (*this);
    }