Why does vector pointers takes so much memory?

Question

I'm working on a project where I need to be as mean as possible regarding the memory usage. I'm trying to calculate the grand total size of a vector<bool> of size 32 in the example:

vector<bool> v(32);
cout << "sizeof: " << sizeof(v) << endl;
cout << "size: " << v.size() << endl;
cout << "capacity: " << v.capacity() << endl;
cout << "max_size: " << v.max_size() << endl;

which gives me:

sizeof: 40                     <- 40 byte? wtf?
size: 32                       <- hoping an element takes up 1 bit
                                  (instead of the usual 1 byte for booleans)
                                  this should take around 32 bit memory
capacity: 64                   <- I guess this is because minimum size is
                                  1 integer = 64 bit
max_size: 9223372036854775744  <- relevant somehow?

on my 64 bit ubuntu 12.04 machine. So I thought I could calculate the memory like so:

40 * 8 + 64 * 1 = 384 bit = 48 byte

So according to this calculation most of the memory is spent for the vector object of size 32. My question is why does a vector object need to use so much memory? And also are there any mistakes in my calculation? How can I be more efficient without doing bitwise manipulations myself for vector sizes around 32?

Quite a few misconceptions... 'size()==32' means that you have 32 bits, which is correct. 'capacity()==64' means that you can add 32 more, and is probably unrelated to the size of an *integer* (`int` is usually 32bits in most platforms). Note that what you call *vector pointer* is really the *vector object* — David Rodríguez - dribeas, Feb 27 '13 at 14:41
Vectors hold more than one pointer (it won't work otherwise!). — R. Martinho Fernandes, Feb 27 '13 at 14:41
Well I thought to have a little dig through the g++ implementation. For normal vectors, it contains 3 pointers: start of data, end of data, end of available space. And `sizeof(std::vector)` is indeed 24. Now `std::vector` is a bit special to take account of storing each `bool` in a single bit. However I haven't yet been able to make any sense of the header file... — BoBTFish, Feb 27 '13 at 14:58
Have you tried `std::bitset<32>`: http://stackoverflow.com/a/88934/14065 — Martin York, Feb 27 '13 at 15:31
@LokiAstari that's what I'm trying atm, thanks. I will accept an answer soon. — none, Feb 27 '13 at 15:39

score 4 · Answer 1 · answered Feb 27 '13 at 14:42

4

Those 40 bytes are administrative overhead. Among other things, a vector has to keep track of its size and capacity (that's two size_ts worth of bytes gone already), and very importantly, a pointer to the actual data!

The actual data kept by the vector is allocated on the heap by the default vector allocator, its memory consumption is not included in the result of sizeof.

answered Feb 27 '13 at 14:42

us2012

16,083
3
46
62

hmm, kind of makes sense. any advice how to make it more efficient? – none Feb 27 '13 at 14:46
2

What is "inefficient" in this? You could of course have a class that store the sizes in 32 bits, limiting the maximum number of elements stored to 4 billion, and saving 8 bytes. Unless you are having LOTS and LOTS of vectors, I suspect that's not really worth doing. – Mats Petersson Feb 27 '13 at 14:50
@gokcehan as I mentioned in my answer, if you know the maximum number of bits you need to store, you can use `std::bitset` instead. – Dirk Holsopple Feb 27 '13 at 14:53

score 3 · Answer 2 · answered Feb 27 '13 at 14:40

3

sizeof(v) is getting the size of v's structure, not the size of v's data. It's like doing:

struct S { int* x };

S s;
s.x = new int[10000];

sizeof(s); // only the size of int* (it doesn't/can't check how much data is allocated)

As for why std::vector<bool> might have a larger structure size than, say, std::vector<int>, remember that the bool version is specialized. It's got to have some extra members for bitwise record keeping.

answered Feb 27 '13 at 14:40

Cornstalks

37,137
18
79
144

I have already tried to add the size of data it points to the total by doing `+ 64 * 1` – none Feb 27 '13 at 14:43

Boris Strandjev · Answer 3 · 2013-02-27T14:56:52.723

1

v.size() returns the number of elements that can be stored in the vector. Not memory-size-directly-related
v.capacity() returns the amount of memory cells allocated for the vector. Vectors allocate some memory in advance, but not all can be accessed initially. Not-memory-size-directly-related
sizeof(v) is getting the size in bytes of the structure itself. I suppose some pointers go here. As well as the size variable etc.

If you want to get the memory the elements of the vector are taking probably this would be the closest expression: v.size() * sizeof(v[0])

edited Feb 27 '13 at 14:56

answered Feb 27 '13 at 14:41

Boris Strandjev

46,145
15
108
135

if some memory is allocated already, shouldn't I be interested in that as well? – none Feb 27 '13 at 14:47
@gokcehan Depends on what you are trying to measure. Generally my answer would be no, because this is just constant overhead - nevermind how many elements you have in the vector you have the overhead of 40 bytes – Boris Strandjev Feb 27 '13 at 14:50
I tried a few different numbers and vector is growing linearly as you said instead of the usual exponential growth of vectors. It's kind of interesting to me because I feel like it would effect the complexity analysis of amortized cost of pushing an element to the vector. – none Feb 27 '13 at 15:15
@gokcehan "exponential growth of vectors" what do you exactly mean by that? what test do you do? – Boris Strandjev Feb 27 '13 at 15:22
I mean the capacity growing like `1, 2, 4, 8, 16, 32 ..` when I keep adding elements but it turned out I was doing the test wrongly so `vector` capacity is also increasing exponentially like `64, 128, 256, 512 ..`. – none Feb 27 '13 at 15:38
@gokcehan this is not depending on the type you store - always the size will be at most twice as much as the stored elements – Boris Strandjev Feb 27 '13 at 15:39

Captain Giraffe · Answer 4 · 2013-02-27T15:05:45.130

1

Looking at the source of the g++ implementation I can find the following members:

vector inherits from _Bvector_impl _M_impl;

From the inherited _Bvector_impl

_Bit_iterator   _M_start;   16 bytes
_Bit_iterator   _M_finish;  16 bytes
_Bit_type*  _M_end_of_storage;  8 bytes

This sums up to 40 bytes.

_Bit_iterator contains

_M_offset, an unsigned int: 8 bytes
_M_p,      a pointer 8 bytes.

edited Feb 27 '13 at 15:05

answered Feb 27 '13 at 14:49

Captain Giraffe

14,407
6
39
67

score 1 · Answer 5 · answered Feb 27 '13 at 14:51

That's the price you pay for a dynamically re-sizable structure. What's returned by sizeof(v) is actually just the overhead for the vector to keep track of the dynamically allocated memory where it stores its data. The space for storing the data is not included in that number.

If you know the number of items at compile time, you can use std::bitset instead. std::bitset actually will use 1 bit per boolean you store (rounded up to the word size it uses) without the overhead of std::vector.

I tried and bitset takes only 8 bytes. I guess that should be the minimum since I need to point to data somehow with a pointer. thanks, +1. — none, Feb 27 '13 at 15:07

score 1 · Accepted Answer · answered Feb 27 '13 at 14:55

If the size of your std::vector is actually fixed at compile-time, then it may make sense to use a std::array instead; an array does not have any overhead (there is padding though).

However, in the very specific case of bool, you should consider std::bitset! A bitset is parameterized by its size and packs the bool as to be as efficient as possible. It's not exactly as memory efficient as a packed array (than you would write yourself) because it keeps a count of the toggled bits.

Demo at liveworkspace:

#include <bitset>
#include <array>
#include <vector>

#include <iostream>

int main() {
   std::vector<bool> vec(32);
   std::array<bool, 32> arr;
   std::array<uint8_t, 4> packed;
   std::bitset<32> bs;

   std::cout << "vector: " << sizeof(vec) + vec.capacity()/8 << "\n";
   std::cout << "array : " << sizeof(arr) << "\n";
   std::cout << "packed: " << sizeof(packed) << "\n";
   std::cout << "bitset: " << sizeof(bs) << "\n";
}

Gives:

vector: 48
array : 32
packed: 4
bitset: 8

Where packed clearly is the most efficient container, but bitset provides a ready-made one for a quite low cost (just as much space as a pointer on 64 bits machines).

this is really helpful, can you elaborate on packed? how does it work? — none, Feb 27 '13 at 15:00
@gokcehan: it work manually, the idea is that you only need 32 bits (so actually a `uint32_t` would be enough) so you create an array of 32 bits (in general, `ceil(N/CHAR_BITS)`) and then use bitmasking operations to query/set the bits. Obviously, it's a bit of work to write the get/set methods, but bitwise manipulations are fairly well documented on the net so you should not have any trouble... you just end up reimplementing `bitset` though (albeit, with a lower memory footprint at the cost of no O(1) "count" method); it's your call what's the best in your situation. — Matthieu M., Feb 27 '13 at 16:11
ahh I see. I had already been using `[]` operations all around my code and `bitset` didn't complain about it when I switched to it. I suppose I could overload `[]` somehow but that's quite a bit of work for me. Also I realized I'm gonna need `c++0x` for arrays which I better not so I will stick with the bitsets for now. I will accept your answer since it's the most comprehensive one, thanks.. — none, Feb 27 '13 at 16:59

Why does vector pointers takes so much memory?

6 Answers6