21

Most std::string implementations (GCC included) use small string optimization. E.g. there's an answer discussing this.

Today, I decided to check at what point a string in a code I compile gets moved to the heap. To my surprise, my test code seems to show that no small string optimization occurs at all!

Code:

#include <iostream>
#include <string>

using std::cout;
using std::endl;

int main(int argc, char* argv[]) {
  std::string s;

  cout << "capacity: " << s.capacity() << endl;

  cout << (void*)s.c_str() << " | " << s << endl;
  for (int i=0; i<33; ++i) {
    s += 'a';
    cout << (void*)s.c_str() << " | " << s << endl;
  }

}

The output of g++ test.cc && ./a.out is

capacity: 0
0x7fe405f6afb8 | 
0x7b0c38 | a
0x7b0c68 | aa
0x7b0c38 | aaa
0x7b0c38 | aaaa
0x7b0c68 | aaaaa
0x7b0c68 | aaaaaa
0x7b0c68 | aaaaaaa
0x7b0c68 | aaaaaaaa
0x7b0c98 | aaaaaaaaa
0x7b0c98 | aaaaaaaaaa
0x7b0c98 | aaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0d28 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

I'm guessing that the larger first pointer, i.e. 0x7fe405f6afb8 is a stack pointer, and the other ones point to the heap. Running this many times produces identical results, in the sense that the first address is always large, and the other ones are smaller; the exact values usually differ. The smaller addresses always follow the standard power of 2 allocation scheme, e.g. 0x7b0c38 is listed once, then 0x7b0c68 is listed once, then 0x7b0c38 twice, then 0x7b0c68 4 times, then 0x7b0c98 8 times, etc.

After reading Howard's answer, using a 64bit machine, I was expecting to see the same address printed for the first 22 characters, and only then to see it change.

Am I missing something?

Also, interestingly, if I compile with -O (at any level), I get a constant small pointer value 0x6021f8 in the first case, instead of the large value, and this 0x6021f8 doesn't change regardless of how many times I run the program.

Output of g++ -v:

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/foo/bar/gcc-6.2.0/gcc/libexec/gcc/x86_64-redhat-linux/6.2.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../gcc-6.2.0/configure --prefix=/foo/bar/gcc-6.2.0/gcc --build=x86_64-redhat-linux --disable-multilib --enable-languages=c,c++,fortran --with-default-libstdcxx-abi=gcc4-compatible --enable-bootstrap --enable-threads=posix --with-long-double-128 --enable-long-long --enable-lto --enable-__cxa_atexit --enable-gnu-unique-object --with-system-zlib --enable-gold
Thread model: posix
gcc version 6.2.0 (GCC)
gsamaras
  • 71,951
  • 46
  • 188
  • 305
SU3
  • 5,064
  • 3
  • 35
  • 66

2 Answers2

26

One of your flags is:

--with-default-libstdcxx-abi=gcc4-compatible

and GCC4 does not support small string optimzation.


GCC5 started supporting it. isocpp states:

A new implementation of std::string is enabled by default, using the small string optimization instead of copy-on-write reference counting.

which supports my claim.

Moreover, Exploring std::string mentions:

As we see, older libstdc++ implements copy-on-write, and so it makes sense for them to not utilize small objects optimization.

and then he changes context, when GCC5 comes in play.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
0

You can check whether the C++11 ABI is used by default if you call

gcc -v 2>&1 | sed -n 's/.*\(--with-default-libstdcxx-abi=new\).*/\1/p'

If you don't get a result, the old ABI is used. (Taken from the Conan doku)

Besides the reason given by gsamaras, the old ABI is also used in older Redhat versions, which are incompatible with the C++11 ABI: https://bugzilla.redhat.com/show_bug.cgi?id=1546704

mrks
  • 8,033
  • 1
  • 33
  • 62
  • This will only work if the `--with-default-libstdcxx-abi=` was explicitly passed to the GCC `configure` script. – SU3 Sep 27 '21 at 13:30
  • Interesting. Would you consider checking for `_GLIBCXX_USE_CXX11_ABI` to be safer? If yes, I'd change my answer. – mrks Sep 27 '21 at 15:29
  • The right term would be more reliable. This discussion is only tangentially relevant to the original question. I assume you are posting this for your own record. – SU3 Sep 27 '21 at 16:38