I look at the internal structure of library types in order to understand more about how a compiler performs it's magic. Especially container objects. Standard approach is to copy the object to a std::array
that is the same size, then print the array in hex. This can be useful to explore exactly what happens when a container object is "moved" as well as learning how the different library coders implemented the container.
Here's the basic code adapted to std::string
. Examined is how the object changes between an empty string, a string with the maximum, SSO, contents, and a string which requires that the string is stored in the heap.
SSO optimized strings requires the pointer to any character in the string be between the start and end of the string object.
#include <string>
#include <array>
#include <iostream>
#include <iomanip>
#include <cstdint>
void print_string_object(std::string& s)
{
// Check that the size of a string object is a multiple of a pointer size
static_assert(sizeof(uintptr_t) * (sizeof(s) / sizeof(uintptr_t)) == sizeof(s));
// Create an array of uintptr_t that is the same size as a string
using s_obj = std::array<uintptr_t, sizeof(s) / sizeof(uintptr_t)>;
s_obj s_ptrs = *reinterpret_cast<s_obj*>(static_cast<void*>(&s));
// Print details of string object in hex
std::cout << "Address of Object\n " << std::setfill('0') << std::setw(2*sizeof(uintptr_t)) << std::hex << &s << "\nObject\n";
for (auto x : s_ptrs)
std::cout << " " << std::setfill('0') << std::setw(2 * sizeof(uintptr_t)) << std::hex << x << '\n';
}
int max_SSO(std::string &s)
{
// return the maximum string stored in a string object (SSO)
// and set s with bytes 0 1 2 3 ... until SSO is maxed out
std::string s0;
uintptr_t base = reinterpret_cast<uintptr_t>(&s0);
uintptr_t top = reinterpret_cast<uintptr_t>(&s0) + sizeof(s0);
for (int i = 0;; i++)
{
s0 += static_cast<char>(i);
if (reinterpret_cast<uintptr_t>(&s0[0]) < base || reinterpret_cast<uintptr_t>(&s0[0]) >= top)
return i;
s += static_cast<char>(i);
}
}
int main()
{
std::string s;
std::cout << "Capacity of empty string=" << s.capacity() << '\n';
std::cout << "Empty string\n";
print_string_object(s); // print details of null string
std::cout << "\nFull SSO string length=" << std::dec << max_SSO(s) << "\n";
print_string_object(s); // print details of max SSO string
s += "0";
std::cout << "\nDynamic memory string\n";
print_string_object(s); // print details of dynamic allocated string
}
And here's a link to compiler explorer for clang and gcc
MSVC output x64 is:
Capacity of empty string=15
Empty string
Address of Object
000000AF535AF840
Object
0000000000000000
0000000000000000
0000000000000000
000000000000000f
Full SSO string length=15
Address of Object
000000AF535AF840
Object
0706050403020100
000e0d0c0b0a0908
000000000000000f
000000000000000f
Dynamic memory string
Address of Object
000000AF535AF840
Object
00000235a757e8a0
000e0d0c0b0a0908
0000000000000010
000000000000001f
For MSVC, the first 16 bytes are used to store the SSO chars. This allows for a string length of 15 with the required terminating null char. When dynamic memory is required for longer strings, the first 8 bytes is a pointer to the chars stored in the heap. The last 2 entries are the current string size and maximum string size required before memory allocation is needed. GCC and CLANG have somewhat different layouts. CLANG, in particular allows SSO string sizes up to 22 chars and it's object size is 8 bytes less! Very efficient.
I've found the approach very useful for quickly understanding what is actually going on in library container code.