41

Why is std::string's size, as determined by sizeof(std::string), yield 8?
I thought it should be more than 8 as it has to have an int (sizeof(int) == 8 on my machine) data member for giving std::string::length() and std::string::size() in O(1) and probably a char* for characters.

cadaniluk
  • 15,027
  • 2
  • 39
  • 67
shanker861
  • 619
  • 1
  • 5
  • 9
  • Your question isn't clear, please try again. – Ryan Jan 01 '16 at 22:06
  • 16
    @self: it is totally clear. He asks why sizeof(string) < sizeof(char *) + sizeof(int) – Pierre Emmanuel Lallemant Jan 01 '16 at 22:08
  • The answer probably depends on the compiler you are using. – freakish Jan 01 '16 at 22:10
  • thank you @Pierre , thats precisely my confusion, I thought string should have an int and a char* as data member so sizeof(string) >sizeof(char*) – shanker861 Jan 01 '16 at 22:10
  • 2
    @shanker861: it may depend on the implementation. If you do a std::string with only char *, you can use the 8 first bytes as an int, so when you do mystring[0] it's in fact mystring.ptr[8]; and str.size() = *((int *) str.ptr); – Pierre Emmanuel Lallemant Jan 01 '16 at 22:11
  • @freakish, I understand that the number (sizeof(string)==8) will depend on machine but considering string gives length in O(1) time, I though it should have an int data member to do so, correct me if I am wrong – shanker861 Jan 01 '16 at 22:12
  • @shanker861 No, what I meant is that the actual implementation of `std::string` depends on the compiler. I.e. perhaps in your compiler `length()` is not `O(1)` at all. – freakish Jan 01 '16 at 22:13
  • 1
    @freakish you mean the STL, compiler doesn't implement the C code (actual implementation) of `std::string` – Ryan Jan 01 '16 at 22:13
  • 2
    @self: If we're going to get technical, then let's say [C++ Standard Library implementation](http://stackoverflow.com/questions/5205491/whats-this-stl-vs-c-standard-library-fight-all-about). – Cornstalks Jan 01 '16 at 22:15
  • @Pierre, Yes, It seems that would work. Do you this how its done on GCC or other popular compiler ? – shanker861 Jan 01 '16 at 22:19
  • @shanker861: no i don't know exactly their implementation, but 8 is the size of a pointer on a 64bits processor, so they just use 1 attribute in their std::string class. – Pierre Emmanuel Lallemant Jan 01 '16 at 22:27
  • Ancient versions of GCC contain a non-conforming implementation of `std::string`. – Kerrek SB Jan 01 '16 at 22:32
  • 4
    If a `std::string` stores the string's length internally, it's going to be a `size_t`, not an `int`. FWIW, on the implementation I use `sizeof (std::string) == 32`. (It depends on the runtime library implementation, not on the compiler. Two different implementations might use the same compiler but different runtime libraries. – Keith Thompson Jan 01 '16 at 22:32
  • @KerrekSB: GCC doesn't contain an implementation of `std::string`; it's provided by the runtime library. – Keith Thompson Jan 01 '16 at 22:33
  • 9
    @KeithThompson: "[**The GNU Compiler Collection includes** front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (**libstdc++**, libgcj,...).](https://gcc.gnu.org/)" GCC (the GNU Compiler Collection) includes an implementation of `std::string` in `libstdc++`. `gcc` (the compiler) does not. I don't think Kerrek SB's comment is incorrect. – Cornstalks Jan 01 '16 at 22:36
  • 1
    @KeithThompson: GCC ships with libstdc++ as its standard library implementation of choice. – Kerrek SB Jan 01 '16 at 22:45
  • @KerrekSB: Interesting. I know that the gcc C compiler commonly uses different C library implementations on different platforms (often glibc, but also newlib for Cygwin, the Microsoft library for MinGW, or the native library for other platforms). I'm less familiar with the C++ side. So does the MinGW implementation on Windows use the GNU libstdc++? – Keith Thompson Jan 02 '16 at 06:12
  • 2
    @KeithThompson: Yes. In Linux, the C library is usually very intimately tied to the kernel. For C++, GCC ships its own libstdc++ standard library implementation which is intimately tied to the compiler (i.e. you can't get it on its own). Clang goes out of its way to work with libstdc++, but there's also libc++ that's designed to work with Clang (and only Clang as far as I'm aware). There do exist third-party standard library implementations, but they're quite niche. The standard library requires a fair amount of compiler magic, so it's not so easy to reimplement. – Kerrek SB Jan 02 '16 at 13:04

2 Answers2

42

The implementation of std::string is not specified by the C++ standard. It only describes the classes behaviour. However, I would expect there to be more than one pointer's worth of information in the class. In particular:

  • A pointer to the actual string.
  • The size available.
  • The actual size used.

It MAY of course store all these in a dynamically allocated location, and thus take up exactly the same amount of space as char* [in most architectures].

In fact looking at the C++ header that comes with my Linux machine, the implementation is quite clear when you look at (which, as per comments, is "pre-C++11", but I think roughly representative either way):

  size_type
  length() const _GLIBCXX_NOEXCEPT
  { return _M_rep()->_M_length; }

and then follow that to:

  _Rep*
  _M_rep() const _GLIBCXX_NOEXCEPT
  { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

which in turn leads to:

  _CharT*
  _M_data() const _GLIBCXX_NOEXCEPT
  { return  _M_dataplus._M_p; }

Which leads to

  // Data Members (private):
  mutable _Alloc_hider  _M_dataplus;

and then we get to:

  struct _Alloc_hider : _Alloc
  {
    _Alloc_hider(_CharT* __dat, const _Alloc& __a) _GLIBCXX_NOEXCEPT
    : _Alloc(__a), _M_p(__dat) { }

    _CharT* _M_p; // The actual data.
  };

The actual data about the string is:

  struct _Rep_base
  {
    size_type       _M_length;
    size_type       _M_capacity;
    _Atomic_word        _M_refcount;
  };

So, it's all a simple pointer called _M_p hidden inside several layers of getters and a bit of casting...

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • To be fair, dynamic allocation can usually be ruled out by noexcept methods. I do not think `std::string` has enough noexcept requirements for this to be the case. – Yakk - Adam Nevraumont Jan 01 '16 at 22:17
  • 1
    I mean, "together with the string itself", rather than each part being dynamically allocated. – Mats Petersson Jan 01 '16 at 22:27
  • @MarcGlisse: I think it is C++11, since it's from the 4.9.2 version of gcc, which is after C++11 compliance in gcc, and it mentions C++11 in the file? – Mats Petersson Jan 01 '16 at 22:34
  • @MatsPetersson Thank you, very well explained. – shanker861 Jan 02 '16 at 00:48
  • @MatsPetersson: It's pre-C++11. The `_Atomic_word _M_refcount;` implies sharing and thus copy-on-write behaviour, which breaks the constraints of `operator[]`, see http://stackoverflow.com/q/12199710/1139697. – Zeta Jan 02 '16 at 09:46
35

Because all your implementation of std::string stores is a pointer to the heap where all of it's data is stored.

user2357112
  • 260,549
  • 28
  • 431
  • 505
Paul Evans
  • 27,315
  • 3
  • 37
  • 54
  • 3
    simple and best answer. – baash05 Jan 02 '16 at 09:04
  • 1
    This kind of implementation is straightforward for `std::string` implementations using copy-on-write and references counting (libstdc++ did this in the C++98 ABI mode, and linux distributions are in the process of getting rid of it as default right now). C++11 made this kind of implementation illegal, so you will likely find implementations with `sizeof(std::string) == sizeof(void*)` much less in the future. – Michael Karcher Jan 02 '16 at 10:47
  • 2
    This is true as of 2019 - in Visual C++ 2019 `sizeof(std::string) == 28` – Sebazzz Dec 02 '19 at 20:18
  • Also, 32 == sizeof(std::string) ,in clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final); 24 == sizeof(std::string) ,in Apple LLVM version 8.1.0 (clang-802.0.42). But sometimes on OnlineGDB ,it shows: 8 == sizeof(std::string) !!! – Vittore Marcas Dec 22 '19 at 03:26