What is the overhead in the string structure that causes sizeof() to be 32 ?
-
5If you open your platform's `
` header, you can see exactly why `std::string` is that size. @Queso: `sizeof` yields the size of an object _in bytes_. – James McNellis Sep 22 '10 at 15:22 -
3If sizeof returns the number of bits in the pointer then your compiler is broken – Anthony Williams Sep 22 '10 at 15:23
-
@Queso: sizeof() returns bytes, not bits. A 32-byte pointer is a 256-bit address – Matt K Sep 22 '10 at 15:23
-
2@Martin: because "wetness" is pretty much defined as being a property of water (or anyway of liquids). I'm not aware that "32" is defined as being the size of a string. – Steve Jessop Sep 22 '10 at 15:51
-
15@Steve Jessop: Water is wet because of the current implementation (earth as a STP that allows water to be liquid). In other implementations it is not wet (like Jupiter where it is a gas). So this string implementation is 32 because that's the way it was built in this implementation and it will by 16 in other implementations and 64 in yet another. The size of the string will (like water) depend on the environment it is used in. – Martin York Sep 22 '10 at 16:09
-
3OK, so water is liquid because of the Earth's STP, and we could drill further by looking to the factors which affect that (atmospheric pressure is affected by mass and gas emission, for example). So what does asking *why* one set of implementers chose 32, and another chose 64, have to do with asking *why* the earth has a certain surface pressure and temperature? One is a choice made by a sentient being. The other is IMO not, but even if IYO it is, I don't think C++ implementers have quite the claim to ineffability that God does. – Steve Jessop Sep 22 '10 at 16:28
-
@ Steve Jessop: Note: The whole point we all decided to be programmers is so that we can feel like GOD (the architects of our own little Universe). – Martin York Sep 22 '10 at 18:35
-
1@ Steve Jessop: Asking why means nothing; it is so because the implementers made it so. Now asking `why is string 32 bytes for compiler X version y on platform z running OS a revision b` does make sense. Why is water wet under STP on earth 1.0? Because in this environment it is in a liquid form. So I call it an analogy. Asking why is a string 32 is as meaningless as asking why water is wet. Both are **only** true under specific conditions. Without understanding the conditions it is impossible to answer. – Martin York Sep 22 '10 at 18:45
-
Yup makes sense ... serves me right for treating StackOverflow like Twitter. – agam Sep 25 '10 at 07:38
6 Answers
Most modern std::string
implementations1 save very small strings directly on the stack in a statically sized char
array instead of using dynamic heap storage. This is known as Small (or Short) String Optimisation (SSO). It allows implementations to avoid heap allocations for small string objects and improves locality of reference.
Furthermore, there will be a std::size_t
member to save the strings size and a pointer to the actual char
storage.
How this is specifically implemented differs but something along the following lines works:
template <typename T>
struct basic_string {
char* begin_;
size_t size_;
union {
size_t capacity_;
char sso_buffer[16];
};
};
On typical architectures where sizeof (void*)
= 8, this gives us a total size of 32 bytes.
1 The “big three” (GCC’s libstdc++ since version 5, Clang’s libc++ and MSVC’s implementation) all do it. Others may too.

- 530,221
- 131
- 937
- 1,214
-
@KonradRudolph very small strings are saved directly into the object, and this can be the stack or also the heap depending on where the string itself is allocated, no ? – Manuel Selva Jun 02 '16 at 06:58
-
-
@KonradRudolph How to force strings to be always heap allocated? (For the intention of having string objects smaller than 32 bytes, e.g. 8 bytes.) – Luke Oct 02 '18 at 01:17
-
@LukeFisk-Lennon You can’t. Small string optimisation is an implementation detail of certain (well, all modern) standard library implementations, it’s not specified by the language. As such you cannot change it within C++. You also can’t change it outside of C++ (e.g. via compiler options) because such a change would be [ABI breaking](https://stackoverflow.com/q/2171177/1968). That said, GCC4 didn’t perform small string optimisation so in principle you could configure your GCC with `--with-default-libstdcxx-abi=gcc4-compatible` but that would be a terrible idea (= very old implementation). – Konrad Rudolph Oct 02 '18 at 09:02
-
std::string
typically contains a buffer for the "small string optimization" --- if the string is less than the buffer size then no heap allocation is required.

- 66,628
- 14
- 133
- 155
-
-
4Windows compilers aren't the only ones that do the small-string optimization – Anthony Williams Sep 22 '10 at 16:06
-
Sure, but if you're not willing to name them then it's hard to judge whether this is "typical" behaviour, or just called that on the grounds that it's the behaviour of a common implementation (and presumably others). – Steve Jessop Sep 22 '10 at 16:13
-
From what I understand, Dinkumware and STLPort both do, but gcc's implementation doesn't. – Dennis Zickefoose Sep 22 '10 at 16:30
-
3Btw, I mention it because "typically" spans a range from "I'm reasonably confident you'll never see anything else", to "50% or more of the implementations I've used do this". It's very easily misunderstood, I think. Neither this optimization, nor the absence of it, should be considered unusual. – Steve Jessop Sep 22 '10 at 16:32
-
Note that IBM-AIX C++ implementation contains a small string implementation with a 32 characters buffer (see here: http://www-01.ibm.com/support/docview.wss?uid=swg21453760) – Theo Jul 21 '16 at 16:40
In g++5.2 (in e.g. g++4.9, it is different) a string is basically defined as :
class string {
char* bufferp;
size_t length;
union {
char local_buffer[16];
size_t capacity;
};
};
On an ordinary computer this adds up to 32 bytes (8+8+16).
The actual definition is of course
typedef basic_string<char> string;
but the idea is the same.

- 1,263
- 11
- 15
My guess is:
class vector
{
char type;
struct Heap
{
char* start;
char* end;
char* allocatedEnd;
};
struct Stack
{
char size;
char data[27];
}
union
{
Stack stackVersion;
Heap heapVersion;
} version;
};
But I bet there are hundreds of ways of doing it.

- 257,169
- 86
- 333
- 562
-
-
1@ErikAronesty There was a phase were reference counting was attempted with `std::string` but it became obvious that his was not very efficient (there were several papers on it over the years) and instead the short string optimization became popular. – Martin York May 16 '17 at 21:48
It is library dependent. You shouldn't rely on the size of std::string
objects because it is likely to change in different environments (obviously between different standard library vendors, but also between different versions of the same library).
Keep in mind that std::string
implementations are written by people who have optimized for a variety of use cases, typically leading to 2 internal representations, one for short strings (small internal buffer) and one for long strings (heap-allocated external buffer). The overhead is associated to holding both of these inside each std::string
object.

- 44,541
- 12
- 67
- 125
Q: Why is a dog yellow? A: It's not necessarily.
The size of a (an?) std::string object is implementation-dependent. I just checked MS VC++ 2010. It does indeed use 32 bytes for std::string. There is a 16 byte union that contains either the text of the string, if it will fit, or a pointer to heap storage for longer strings. If the implementers had chosen to keep 18 byte strings in the string object rather than on the heap, the size would be 34 bytes. The other 16 bytes comprise overhead, containing such things as the length of the string and the amount of memory currently allocated for the string.
A different implementation might always allocate memory from the heap. Such an implementation would undoubtedly require less memory for the string object.

- 16,680
- 9
- 52
- 65