24

I would like to get the bytes a std::string's string occupies in memory, not the number of characters. The string contains a multibyte string. Would std::string::size() do this for me?

EDIT: Also, does size() also include the terminating NULL?

小太郎
  • 5,510
  • 6
  • 37
  • 48

6 Answers6

28

std::string operates on bytes, not on Unicode characters, so std::string::size() will indeed return the size of the data in bytes (without the overhead that std::string needs to store the data, of course).

No, std::string stores only the data you tell it to store (it does not need the trailing NULL character). So it will not be included in the size, unless you explicitly create a string with a trailing NULL character.

Lukáš Lalinský
  • 40,587
  • 6
  • 104
  • 126
  • Is it valid to say that std::string is the same as an char array? Or are there any major differences? – rzetterberg Jun 04 '11 at 08:11
  • 2
    Yes, char array is exactly what `std::string` is. There are some implementation differences between `std::string` and `std::vector`, but the data they are storing is the same. – Lukáš Lalinský Jun 04 '11 at 08:15
  • 7
    just want to point out that the reason `std::string::size()` doesn't include the `NULL` character is to follow the convention set by `strlen` which also doesn't include it. Actual implementations of `std::string` do require the *storage* for the terminating `NULL`, in order to implement the `string::c_str()` method with minimal overhead. Maybe [this question](http://stackoverflow.com/q/4653745) explains better than I do. – rwong Jun 04 '11 at 08:23
  • 3
    While the `size()` does not consider the trailing `0`, the fact is that most implementations will keep a trailing NUL. The standard requires that `static_cast(str)[str.size()]` yields `0` (casted to the appropriate `charT` type) and in many implementations that is achieved by always keeping an extra `0` at the end (arguably, it could be implemented with a condition in `operator[]`). The upcoming standard extends that guarantee to the non-const `operator[]`. Also, there is no guarantee that the implementation does not allocate extra space, i.e. `capacity() >= size()`. – David Rodríguez - dribeas Jun 04 '11 at 08:29
  • 1
    Thanks for the useful insights, Lukáš, rwong and David. – rzetterberg Jun 04 '11 at 08:37
11

You could be pedantic about it:

std::string x("X");

std::cout << x.size() * sizeof(std::string::value_type);

But std::string::value_type is char and sizeof(char) is defined as 1.

This only becomes important if you typedef the string type (because it may change in the future or because of compiler options).

// Some header file:
typedef   std::basic_string<T_CHAR>  T_string;

// Source a million miles away
T_string   x("X");

std::cout << x.size() * sizeof(T_string::value_type);  
Olivia Stork
  • 4,660
  • 5
  • 27
  • 40
Martin York
  • 257,169
  • 86
  • 333
  • 562
6

std::string::size() is indeed the size in bytes.

Will A
  • 24,780
  • 5
  • 50
  • 61
4

To get the amount of memory in use by the string you would have to sum the capacity() with the overhead used for management. Note that it is capacity() and not size(). The capacity determines the number of characters (charT) allocated, while size() tells you how many of them are actually in use.

In particular, std::string implementations don't usually *shrink_to_fit* the contents, so if you create a string and then remove elements from the end, the size() will be decremented, but in most cases (this is implementation defined) capacity() will not.

Some implementations might not allocate the exact amount of memory required, but rather obtain blocks of given sizes to reduce memory fragmentation. In an implementation that used power of two sized blocks for the strings, a string with size 17 could be allocating as much as 32 characters.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
2

Yes, size() will give you the number of char in the string. One character in multibyte encoding take up multiple char.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
0

There is inherent conflict in the question as written: std::string is defined as std::basic_string<char,...> -- that is, its element type is char (1-byte), but later you stated "the string contains a multibyte string" ("multibyte" == wchar_t?).

The size() member function does not count a trailing null. It's value represents the number of characters (not bytes).

Assuming you intended to say your multibyte string is std::wstring (alias for std::basic_string<wchar_t,...>), the memory footprint for the std::wstring's characters, including the null-terminator is:

std::wstring myString;
 ...
size_t bytesCount = (myString.size() + 1) * sizeof(wchar_t);

It's instructive to consider how one would write a reusable template function that would work for ANY potential instantiation of std::basic_string<> like this**:

// Return number of bytes occupied by null-terminated inString.c_str().
template <typename _Elem>
inline size_t stringBytes(const std::basic_string<typename _Elem>& inString, bool bCountNull)
{
   return (inString.size() + (bCountNull ? 1 : 0)) * sizeof(_Elem);
}

** For simplicity, ignores the traits and allocator types rarely specified explicitly for std::basic_string<> (they have defaults).

JayRock
  • 11
  • 2