1

I want to calculate how much memory is allocated when I create and assign values to a string.

string s = "";
cout << sizeof(s) << endl;
cout << sizeof(s.at(0)) * s.capacity() << endl;
s = "1234567890qwertz";
cout << sizeof(s) << endl;
cout << sizeof(s.at(0)) * s.capacity() << endl;

Is this all memory that my string s consumes? The initial/static part that I get by simply calling sizeof(s) (being 40 bytes on my machine) plus the dynamic part - the size of one character times the allocated placeholders for making strings efficiently resizable (on my machine the string s first allocated a block of 15 bytes until the point where the text is too long, so after the second assignment the dynamic part is 31 bytes). Why not 16 and 32 bytes by the way?

Is this way of thinking about it (static + dynamic for each string is all the memory it occupies) correct?

Meaning that if I have a std::vector of strings, and I wanted to calculate all the memory for that vector as well, I would need to kind of do the same thing: I add the initial/static size of my vector to get the plus the dynamic part which means the total memory occupied by one string the way I do it above for each string inside the vector?

vector<string> cache;
// fill cache with strings of dynamic length
int size = sizeof(cache);
for (int i = 0; i < cache.size(); i++)
{
    size += sizeof(cache[i]);
    size += sizeof(cache[i].at(0)) * cache[i].capacity();
}

So to sum it all up, is that the correct amount of memory occupied by my "cache"?

Edit: Or do I also need to take into account that a std::vector itself also has a .capacity() >= .size() which could mean that I would actually need to do this:

for each cache.capacity() - I would need to add sizeof(cache[i]) and additionally for each cache.size() - add sizeof(cache[i].at(0)) * cache[i].capacity() ??

huzzm
  • 489
  • 9
  • 24
  • 3
    *I want to calculate how much memory is allocated when I create and assign values to a string* -- Ok, I'll ask. Why do you need this information? – PaulMcKenzie Jan 17 '19 at 13:55
  • 1
    `sizeof(s)` is the size of the *object* `s`, not the string itself. Unless there's some short-string optimization (which stores the contents inside the actual string object), then a `std::string` is really nothing more than a couple of sizes and a pointer to the actual string data. – Some programmer dude Jan 17 '19 at 13:56
  • @Someprogrammerdude Do those 40 or those couples of sizes and a pointer to the actual data add up as I put more and more string objects into a std::vector? – huzzm Jan 17 '19 at 13:57
  • 1
    To your side question: https://stackoverflow.com/a/11752722/3552770 – LogicStuff Jan 17 '19 at 13:57
  • Of course, each `std::string` object will need space. – Some programmer dude Jan 17 '19 at 13:58
  • @PaulMcKenzie I create a software data cache (simply a vector of strings) and I want to know how my memory grows as I put more elements (strings) into it. So that I can say - when it would become bigger than 40KB I want to clear the cache and so on... – huzzm Jan 17 '19 at 13:59
  • @Someprogrammerdude So it is correct to do something like for each string in vector: sizeof(s) (which is 40 on my machine) + [dynamically allocated size of that string]? – huzzm Jan 17 '19 at 14:00
  • @LogicStuff Ahh yes makes sense. Thank you. I forgot about that one. – huzzm Jan 17 '19 at 14:02
  • If you care about 40KB you may avoid using `std::vector>` in this case. – Slava Jan 17 '19 at 14:03
  • Get free memory before and after string instantiation. – Michael Chourdakis Jan 17 '19 at 14:06
  • 1
    @huzzm yes, although some strings might not allocate memory at all. – Bartek Banachewicz Jan 17 '19 at 14:07
  • @Slava That might be true, but I still do care about an answer to my question. – huzzm Jan 17 '19 at 14:08
  • @BartekBanachewicz Oh really, how come? – huzzm Jan 17 '19 at 14:08
  • 1
    @huzzm global / static strings with small string optimization. – Goswin von Brederlow Jan 17 '19 at 14:23
  • The `std::string` class has a `size()` method which will give you its size in bytes – Nadir Jan 17 '19 at 14:25
  • 1
    @Nadir more accurately, `size()` returns the number of valid `char`s in the string. It does not care whether the chars are stored in dynamic memory or in an internal SSO buffer. The `capacity()` returns the actual number of chars allocated, but it also does not differentiate between dynamic and SSO memory. You can't get the true byte size of a `std::string` without knowing the internal details of its implementation. – Remy Lebeau Jan 17 '19 at 16:47

3 Answers3

1

This question is going to be hard to answer. Naively you would think the total amount of memory consumed would be

vector_capacity * sizeof(std::string) + sum_of_capacity_of_each_string_in_the_vector

But this is more an upper limit, not what could be actually consumed. For instance, short string optimization allows std::string to store the string data in the storage the string object itself consumes (what you call the static size). If that is the case then the actual space consumed would be

vector_capacity * sizeof(std::string)

and the capacity of each string in the vector would just be how much space you take up without allocating any extra space. You will need to check your implementation to see if it uses SSO and long of a string it will store in the string object to actually know if the capacity value is using the strings internal space or actually consuming additional memory. That makes the actual space consumed

vector_capacity * sizeof(std::string) + sum_of_capacity_of_each_string_in_the_vector_where_
                                        the_capcity_is_more_than_the_sso_amount

In you calculation sizeof(cache[i].at(0)) is not needed. std::string use char and sizeof(char) is guaranteed to be 1

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
  • Thank you very much. This was the kind of answer I was looking for. – huzzm Jan 17 '19 at 14:18
  • You don't need the implementation details per se; you just need to know when a string allocates new memory. This can be done by e.g. [using a custom allocator](https://stackoverflow.com/a/26132207/752976). – Bartek Banachewicz Jan 17 '19 at 14:18
  • I edited the question again a little bit because a vector has a capacity >= size as well. Does that change anything or does a capacity of 15 while there are just 3 strings inside the vector mean that it is actually 15 * sizeof(string) + 3 * _dynamic memory of each single string (capacity * size of char)_ – huzzm Jan 17 '19 at 14:21
  • @BartekBanachewicz Ok. Thanks! – huzzm Jan 17 '19 at 14:22
  • 1
    @huzzm It does. I've updated the answer to use `vector_capacity * sizeof(std::string)` – NathanOliver Jan 17 '19 at 14:22
1

There is a simple reason why the capacity of the string is one less than you expect and that is

s.c_str()

A C++ string is stored in a block of memory with capacity giving the total size and size for the used space. But a C string is 0 terminated. The C++ string reserve one extra byte at the end of the block of memory to store a 0. That way s.c_str() is always 0 terminated.

So the memory used by the dynamic part of the string is capacity + 1.

As to the total memory consumed by a string or vector of strings NathanOliver answered that I think. But beware of vectors holding the same string multiple times.

Goswin von Brederlow
  • 11,875
  • 2
  • 24
  • 42
  • Oh yes, very true. I forgot about the null termination. Thanks for the additional answer! – huzzm Jan 17 '19 at 14:28
1

If you want to know how much space your std::vector<std::string> uses, calculate it:

auto netto_memory_use(std::vector<std::string> const& x) noexcept {
    return std::accumulate(
        begin(x),
        end(x),
        sizeof x + sizeof x[0] * x.capacity(),
        [](auto n, auto&& s) {
            if (std::less<void*>()(data(s), &s)
            || std::greater_eq<void*>()(data(s) + s.capacity(), &s + 1))
                return n + s.capacity() + 1;
            return n;
        });
    }

I used std::less<void*> / std::greater_eq<void*> to take advantage of them defining a full order, in contrast to just using the comparison-operators.

The accumulator tests for applied small-string-optimisation (SSO) before adding the string's capactiy. Of course, all 0-capacity strings could share the same statically-allocated terminator. Or capacity and/or length could be allocated together with the character-data.
Still, that should be a good approximation for the memory used, aside from memory-management-system overhead.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118