3
cout << sizeof(std::string) << endl;

The result is 8 on my 64-bit machine, which is the same as sizeof(char*), so I am assuming the string class stores only the char*. How, then, is the size function implemented? Is it using strlen (since it is not storing the actual size or the pointer to the ending byte)?

On this page, it shows the size function has a constant time-complexity, so I am confused. And on another page someone has a larger string size.

I am using GCC 4.7.1 on Fedora 64 bit.

Community
  • 1
  • 1
SwiftMango
  • 15,092
  • 13
  • 71
  • 136
  • 5
    `` should be header-only, so you could just take a look... – Oliver Charlesworth Sep 11 '13 at 21:56
  • You didn't specify what `string` is. Do you mean `std::string` from ``? – T Percival Sep 11 '13 at 21:57
  • 4
    Conceivably, `std::string` could store a pointer to a single block of memory that holds its length followed by actual characters. – Igor Tandetnik Sep 11 '13 at 21:59
  • The only non-static member I found in the header is: `mutable _Alloc_hider _M_dataplus;`, and `_Alloc_hider` seems to have only a `char*`member. – SwiftMango Sep 11 '13 at 22:04
  • @IgorTandetnik: Or (as Microsoft has defined `BSTR`) a pointer to the actual characters, prefixed by their length. I.e. `*((size_t*)ptr - 1)`. That means `.size()` needs a fixed offset and `operator[]` a variable one, but with your proposal `operator[]` needs two offsets. That's more expensive on x86 IIRC. – MSalters Sep 12 '13 at 07:56

4 Answers4

10

There could be many explanations for that. Just because std::string happens to store a pointer and nothing else does not mean that this is necessarily char * pointer to the controlled sequence. Why did you jump to that conclusion?

It could easily turn out that your std::string is a PImpl-style wrapper for a pointer to some internal object that stores all internal household data, including the char * pointer, the length and whatever else is necessary. That way the internal object can be arbitrarily large, without having any effect on the size of std::string itself. For example, in order to facilitate fast reference-counted copying, in some implementations std::string might be implemented similarly to std::shared_ptr. I.e. std::string in that case would essentially become something like std::shared_ptr<std::string_impl> with added copy-on-write semantics.

The target "string implementation" object might even use "struct hack"-style approach to store the actual string, meaning that instead of storing char * pointer it might embed the entire string into itself at the end.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
7

Looking at the doxygen docs for libstdc++:

_CharT* _M_p; // The actual data

Assuming std::basic_string<char>, _M_p is a char* pointer to the actual data, so that is why you are getting 8.

It even says:

Where the _M_p points to the first character in the string, and you cast it to a pointer-to-_Rep and subtract 1 to get a pointer to the header.

So, it hides a pointer to the actual representation (capacity, length, etc.) in a block of memory right before where the string data is stored.

Then, there is the following member function to get to the representation:

Rep* _M_rep() const
{ return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }

and then they call it like this _M_rep()->_M_length; to get the size for example.

Jesse Good
  • 50,901
  • 14
  • 124
  • 166
1

Your assumption that std::string is char* is wrong. Here is one of q few possible implementations with sizeof(std::string)==sizeof(char*):

struct std::string
{
    string_implementation
    {
        size_t size;
        size_t buffer_size;
        char_traits whatever;
        char *buffer; // Here is your actual string!
    };

    string_implementation *ptr;
}
Michael
  • 5,775
  • 2
  • 34
  • 53
1

std::string is a typdef for std::basic_string<char>, and basic_string is defined (on my machine) in file /usr/include/c++/4.4/bits/basic_string.h. There's a lot of indirection in that file, but roughly speeking std::string stores a pointer to actual data

// Use empty-base optimization: http://www.cantrip.org/emptyopt.html
      struct _Alloc_hider : _Alloc
      {
    _Alloc_hider(_CharT* __dat, const _Alloc& __a)
    : _Alloc(__a), _M_p(__dat) { }

    _CharT* _M_p; // The actual data.
      };

and this is why you observed such behavior. This pointer might might be casted to obtain pointer to structure that describes the well-known string properties (located just in front of actual data):

  struct _Rep_base
  {
size_type       _M_length;
size_type       _M_capacity;
_Atomic_word        _M_refcount;
  };

_Rep* _M_rep() const
      { return &((reinterpret_cast<_Rep*> (_M_data()))[-1]); }
4pie0
  • 29,204
  • 9
  • 82
  • 118