21

So here i got a small test program:

#include <string>
#include <iostream>
#include <memory>
#include <vector>

class Test
{
public:
  Test(const std::vector<int>& a_, const std::string& b_)
    : a(std::move(a_)),
      b(std::move(b_)),
      vBufAddr(reinterpret_cast<long long>(a.data())),
      sBufAddr(reinterpret_cast<long long>(b.data()))
  {}

  Test(Test&& mv)
    : a(std::move(mv.a)),
      b(std::move(mv.b)),
      vBufAddr(reinterpret_cast<long long>(a.data())),
      sBufAddr(reinterpret_cast<long long>(b.data()))
  {}

  bool operator==(const Test& cmp)
  {
    if (vBufAddr != cmp.vBufAddr) {
      std::cout << "Vector buffers differ: " << std::endl
        << "Ours: " << std::hex << vBufAddr << std::endl
        << "Theirs: " << cmp.vBufAddr << std::endl;
      return false;
    }
    
    if (sBufAddr != cmp.sBufAddr) {
      std::cout << "String buffers differ: " << std::endl
        << "Ours: " << std::hex << sBufAddr << std::endl
        << "Theirs: " << cmp.sBufAddr << std::endl;
      return false;
    }
  }

private:
  
  std::vector<int> a;
  std::string b;
  long long vBufAddr;
  long long sBufAddr;
};

int main()
{
  Test obj1 { {0x01, 0x02, 0x03, 0x04}, {0x01, 0x02, 0x03, 0x04}};
  Test obj2(std::move(obj1));

  obj1 == obj2;
  
                       
  return 0;
}

Software i used for test:

Compiler: gcc 7.3.0

Compiler flags: -std=c++11

OS: Linux Mint 19 (tara) with upstream release Ubuntu 18.04 LTS (bionic)

The results i see here, that after move, vector buffer still has the same address, but string buffer doesn't. So it looks to me, that it allocated fresh one, instead of just swapping buffer pointers. What causes such behavior?

Community
  • 1
  • 1
toozyfuzzy
  • 1,080
  • 1
  • 9
  • 20

1 Answers1

41

You're likely seeing the effects of the small/short string optimization (SSO). To avoid unnecessary allocations for every tiny little string, many implementations of std::string include a small fixed size array to hold small strings without requiring new (this array usually repurposes some of the other members that aren't necessary when dynamic allocation has not been used, so it consumes little or no additional memory to provide it, either for small or large strings), and those strings don't benefit from std::move (but they're small, so it's fine). Larger strings will require dynamic allocation, and will transfer the pointer as you expect.

Just for demonstration, this code on g++:

void move_test(std::string&& s) {
    std::string s2 = std::move(s);
    std::cout << "; After move: " << std::hex << reinterpret_cast<uintptr_t>(s2.data()) << std::endl;
}

int main()
{
    std::string sbase;

    for (size_t len=0; len < 32; ++len) {
        std::string s1 = sbase;
        std::cout << "Length " << len << " - Before move: " << std::hex << reinterpret_cast<uintptr_t>(s1.data());
        move_test(std::move(s1));
        sbase += 'a';
    }
}

Try it online!

produces high (stack) addresses that change on move construction for lengths of 15 or less (presumably varies with architecture pointer size), but switches to low (heap) addresses that remain unchanged after move construction once you hit length 16 or higher (the switch is at 16, not 17, because it is NUL-terminating the strings, since C++11 and higher require it).

To be 100% clear: This is an implementation detail. No part of the C++ spec requires this behavior, so you should not rely on it occurring at all, and when it occurs, you should not rely on it occurring for specific string lengths.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • 4
    *"include a small fixed size array to hold small strings"* - You don't usually include an array for SSO but you reuse the available storage (size / pointer / ...) and add a flag to indicate if you have a short string. – Holt Jan 29 '19 at 12:05
  • 2
    @Holt: Sure, but the effect is the same for the OP's purposes. For that matter, a dedicated flag isn't necessary if you make the cutoff a strict length/capacity limit (which, if you're just shoving data into pointers members, is going to be limited anyway). – ShadowRanger Jan 29 '19 at 12:23
  • 4
    @Holt: Which is to say, you use a discriminated `union` that contains a `char[N]` for short strings. So the array is there in the type, even though it might not be present in all objects. – MSalters Jan 29 '19 at 12:26
  • @MSalters I know that. My point was more on how this answer is phrased since, at least for me, it reads as if SSO needs extra memory within a `std::string`, which it does not (save from maybe a bit-flag, if any). – Holt Jan 29 '19 at 12:28
  • 1
    @Holt: I did qualify the statement, just to be clear. – ShadowRanger Jan 29 '19 at 12:40
  • Doesn't look like pointer size defines the limit for SSO: adding `-m32` to compiler flags in your "try it online" link still results in 15 being the limit. – Ruslan Jan 29 '19 at 14:42
  • 2
    @Ruslan: Looks like it. I just checked GCC 8's header, and it just defines a `enum` constant `_S_local_capacity = 15 / sizeof(_CharT)`, then defines `union { _CharT _M_local_buf[_S_local_capacity + 1]; size_type _M_allocated_capacity; };` So it's actually reserving a fixed 16 bytes unioned with the capacity as a `size_type`, which means the SSO array is 8-12 bytes larger than the members it's sharing. The 32 bit `string` is smaller (`sizeof` reports 24 bytes, vs. 32 for the 64 bit `string`), but I'm guessing it could have been 12 for 32 bit and 24 for 64 bit without the SSO. – ShadowRanger Jan 29 '19 at 15:16
  • 1
    @ShadowRanger … at which point it becomes mandatory to link to [The strange details of `std::string` at Facebook](https://channel9.msdn.com/events/CPP/CppCon-2016/CppCon-2016-Nicholas-Ormrod-The-strange-details-of-stdstring-at-Facebook) … (nice research BTW) – Arne Vogel Jan 29 '19 at 18:49