21

It just occurred to me I noticed that std::string's substr operation could be much more efficient for rvalues when it could steal the allocated memory from *this.

The Standard library of N3225 contains the following member function declaration of std::string

basic_string substr(size_type pos = 0, size_type n = npos) const;

Can an implementation that could implement an optimized substr for rvalues overload that and provide two versions, one of which could reuse the buffer for rvalue strings?

basic_string substr(size_type pos = 0) &&;
basic_string substr(size_type pos, size_type n) const;

I imagine the rvalue version could be implemented as follows, reusing the memory of *this an setting *this to a moved-from state.

basic_string substr(size_type pos = 0) && {
  basic_string __r;
  __r.__internal_share_buf(pos, __start + pos, __size - pos);
  __start = 0; // or whatever the 'empty' state is
  return __r;
}

Does this work in an efficient fashion on common string implementations or would this take too much housekeeping?

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • Wouldn’t that leak memory? After all, we are only stealing a part of the memory, not the whole block … – Konrad Rudolph Feb 16 '11 at 21:43
  • @Konrad we need to remember that we were stealing memory and when freeing it we need to back-offset it to the origin. – Johannes Schaub - litb Feb 16 '11 at 21:44
  • 1
    Note that strings in some other languages like Java do this - some problems include that if you allocate a large string, extract a small substring from it, and then release the large one, you've still got the data for the large string hanging around in memory as long as you're using the substring. – Anon. Feb 16 '11 at 21:48
  • 1
    That's basically copy-on-write for substrings, no? – etarion Feb 16 '11 at 21:49
  • 1
    I think it would probably take too much housekeeping to be worthwhile. Along the other concerns, it doesn't seem like it would play very well with the short string optimization; you'd probably have to disable it when taking a substring of an already-"short" string. – Jerry Coffin Feb 16 '11 at 22:00
  • c++ standard lib needs immutable strings and vectors. – Cheers and hth. - Alf Feb 16 '11 at 22:20
  • The `pos = 0` (left substring) special case wouldn't need housekeeping. – aaz Feb 16 '11 at 22:42
  • 5
    I have a faint memory of this being discussed in committee, but I can't find any documentation of it. It may have been just an informal discussion. I believe a vendor could add this optimization at his discretion. – Howard Hinnant Feb 17 '11 at 01:22

3 Answers3

4

Firstly, an implementation cannot add an overload that steals the source, since that would be detectable:

std::string s="some random string";
std::string s2=std::move(s).substr(5,5);
assert(s=="some random string"); 
assert(s2=="rando");

The first assert would fail if the implementation stole the data from s, and the C++0x wording essentially outlaws copy on write.

Secondly, this wouldn't necessarily be an optimization anyway: you'd have to add additional housekeeping in std::string to handle the case that it's a substring of a larger string, and it would mean keeping large blocks around when there was no longer any strings referencing the large string, just some substring of it.

Anthony Williams
  • 66,628
  • 14
  • 133
  • 155
  • 2
    This isn't true, since `::std::move` is basically saying "Please, steal data from the source, it's just fine!". – Omnifarious Feb 17 '11 at 17:05
  • `std::move` casts it to an rvalue, but the standard still specifies what is permitted on that rvalue. Since the only specified overload of `substr` is `const`, it can't modify the value. If the implementation provides an overload that does, this is detectable. – Anthony Williams Feb 17 '11 at 17:13
  • Note that this is quite different from `s2=std::move(s)`, since there is an overload of `operator=` which takes an rvalue reference, so you cannot rely on the state of `s` afterwards --- you can detect differences between implementations, but you don't know what to expect. – Anthony Williams Feb 17 '11 at 17:15
  • The first assert will probably fail, since the value of s is undefined after passing it to std::move... – Chris Dodd Feb 17 '11 at 18:43
  • @Chris: No, it isn't. That's my point: `s` has a defined value unless something explicit is done that modifies that value. `std::move(s)` is just a wrapper for `static_cast(s)`, and doesn't itself change the value of `s`. `substr` is a `const` member function, so doesn't change the value either. This is therefore a valid snippet with defined behaviour. – Anthony Williams Feb 17 '11 at 21:16
  • 4
    @Anthony, n3225 says "If a function argument binds to an rvalue reference parameter, the implementation may assume that this parameter is a unique reference to this argument.". Does this not apply to the implicit object parameter of member functions added by overload resolution? When we say `std::move(s).substr(...)`, and afterwards still use `s` then it seems to me we would violate that assumption. – Johannes Schaub - litb Feb 17 '11 at 21:44
  • 2
    Yes, but `std::move(s).substr(..)` doesn't bind anything to an rvalue reference **parameter**. `std::move` takes an lvalue reference and creates an rvalue reference. `substr` is a `const` member function, so the implicit object parameter is a `const` lvalue reference. As I said already, this is quite different from `s2=std::move(s)`, where `s` **is** bound to the rvalue reference parameter of `operator=`. – Anthony Williams Feb 18 '11 at 09:11
  • @Anthony ah I see now. I was in the impression that if the impl would overload `substr` with an ref-qualifier overload, that then it would bind to such a parameter. But since in the spec there is no such overload, the impl cannot simply add that overload and then argue with "If a function argument ...". I see it clear now, thanks! – Johannes Schaub - litb Feb 19 '11 at 04:26
0

Yes, and maybe it should be proposed to the standards committee, or maybe implemented in a library. I don't really know how valuable the optimization would be. And that would be an interesting study all on its own.

When gcc grows support for r-value this, someone ought to try it and report how useful it is.

Omnifarious
  • 54,333
  • 19
  • 131
  • 194
-2

There are a few string classes out there implementing copy-on-write. But I wouldn't recommend adding yet another string type to your project unless really justified.

Check out the discussion in Memory-efficient C++ strings (interning, ropes, copy-on-write, etc)

Community
  • 1
  • 1
Stefan Dragnev
  • 14,143
  • 6
  • 48
  • 52