5

Short: How can I calculate correctly the memory space in bytes of std::vector<bool> that store n bits?

std::vector<bool> vb(n, false);
int size_bytes = compute_space_memory_in_bytes(vb);

Detail:
In my algorithm i use vector<bool> to store n bits. In order to measure it efficiently in practice, I need to know how to calculate the space memory in bytes. (In theory it is just O(n) bits).

There are 2 points:

  1. If we have std::vector<int>, the solution from another answer is:
    sizeof(std::vector<int>) + (sizeof(int) * MyVector.size())

  2. Since vector store each Boolean value into a single bit

One potential optimization involves coalescing vector elements such that each element occupies a single bit instead of sizeof(bool) bytes.

Hence, from 1 and 2, my attempt solution is :

std::vector<bool> vb(100, false);
auto size_bytes = sizeof(vector<bool>) + vb.size()/8;
std::cout << "Hello, size = " << size_bytes << " bytes!\n";

Is that correct ?

Edit: To be more explicit (Thanks to @PaulMcKenzie comment):
Given n bits that to be determined at execution time. My problem is what is the space occupied (exactly or approximately) to store n bits in vector of bool?

std::vector<bool> vb;

// some processing to get n ..

vb.resize(n);

auto size_bytes = compute size of vb in bytes ???;

std::cout << "Hello, size = " << size_bytes << " bytes!\n";
ibra
  • 1,164
  • 1
  • 11
  • 26
  • 11
    What is the [high-level problem](https://xyproblem.info/) you're trying to solve? The `sizeof(std::vector)`, is a compile-time value, and will not change regardless of the number of bools you assign to it at runtime. – PaulMcKenzie Apr 30 '21 at 18:14
  • 2
    “Is that correct?” — In a nutshell, no. It depends on several factors. The size of a vector in bytes (not just `bool`) is completely implementation-defined. All you can say is that it’s ≥ 1, or ≥ number of elements / 8, whichever is larger (though in reality the implementation needs to store at least the size and the capacity, so these give a somewhat tighter lower bound). – Konrad Rudolph Apr 30 '21 at 18:16
  • 1
    `sizeof(vector<>)` gives you size occupied by the `vector` object itself. This size is always the same regardless of number of elements in the vector. – SergeyA Apr 30 '21 at 18:17
  • @selbie, no, it is one of the oldest :) – SergeyA Apr 30 '21 at 18:17
  • 3
    `+ vb.capacity()/8;` is probably a little more accurate. A bunch of implementations use something like a int[] under the hood, so they will change the allocated memory in like 32-element increments at least. But even that is not fully accurate because it doesn't account for small-buffer optimization – PeterT Apr 30 '21 at 18:20
  • 2
    Even in the paragraph -- *One potential optimization...* -- Therefore why do you believe you can write code that will tell you this information, if it's all implementation defined? – PaulMcKenzie Apr 30 '21 at 18:20
  • @PaulMcKenzie, my problem is what is the space occupied (exactly or approximately) to store n bits in vector of bool? – ibra Apr 30 '21 at 18:21
  • 4
    @ibra -- Look at the underlying source code and see how it is implemented. You can't code your way into a solution -- again, this is implementation defined. – PaulMcKenzie Apr 30 '21 at 18:22
  • @SergeyA I just tried the code in the question with several different numbers of booleans and the sizeof-operator always showed me different values for the size. I used VS 2019 in Debug-Mode to test it. – Bananenkönig Apr 30 '21 at 18:25
  • 2
    You will get a different answer for different sizes. `sizeof(vector)` is fixed size. It's not going to change. But `vb.size()/8;` is going to change with the number of elements, but the result is going to be at least slightly wrong. – user4581301 Apr 30 '21 at 18:30
  • 1
    The Big 3 implementations are all open source. You can go look at the code to see how the underlying storage is defined (none of them use an array of **bytes**, they're 32 or 64 bit words). [libstdc++](https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/bitset), [libcxx](https://github.com/llvm/llvm-project/blob/main/libcxx/include/bitset), [Microsoft](https://github.com/microsoft/STL/blob/main/stl/inc/bitset) – Blastfurnace Apr 30 '21 at 18:33
  • 4
    "*my problem is what is the space occupied (exactly or approximately) to store n bits in vector of bool?"* -- No, that's your question. Your problem is something that you could solve if you know the answer to your question, and we don't know what that is. In other words, why do you want to know? Idle curiosity is a valid answer, but if there's a concrete problem you're trying to solve, it could help us give you a meaningful answer. See also https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem – Keith Thompson Apr 30 '21 at 18:37
  • @KeithThompson , thank you for the explanation :) . My problem is: in the context of comparing space memory of tow data-structures, one of them use bitvector, and i could not use bitset because n is not known, i have only vector, Hence my question is my problem. How to compute the sizeof to get the answer of space occupied, so that I can compare it with the other data-structure. – ibra Apr 30 '21 at 18:46

1 Answers1

3

For your re-stated question:

How to compute the sizeof to get the answer of space occupied

As others pointed out, different implementation of vector may produce different answers to your question.

Generally speaking, the memory "occupied" by your boolean values, in bytes, is:

int s = (n + 7) / 8;

If your implementation uses 32- or 64-bit values to pack bool into a vector, you need to round up to 32 or 64:

int s = (n + 31) / 32;

or

int s = (n + 63) / 64;

There is some memory that an instance of the vector itself uses (pointer to the first element, number of elements or a pointer to the last element, capacity, etc.); as @paulsm4 stated, it is 40 bytes in his implementation of the vector.

You may also want to consider allocated but not yet occupied memory. This also depends on implementation.

In conclusion, you can definitely state only the MINIMUM size your vector will occupy.

Vlad Feinstein
  • 10,960
  • 1
  • 12
  • 27
  • In the case 32 (same goes for 64), **(n + 31) / 32** gives us the number of slots, where each one is 32 bits; hence to find the number of bytes we need to multiply obviously by (32/8); so, **res = ((n + 31) / 32)*(32/8)**. If n=100*32, the formula (n + 31) / 32 gives us 100, which is 100 slots (32 bits each), hence the number of bytes is 100*(32/8) = 400 bytes. Also, as you said, I add to it the sizeof(vector), at the end , res = sizeof(vector) + ((n + 31) / 32)*(32/8) . – ibra May 02 '21 at 01:30
  • can we use directly : ( ( nb_bits + (sizeof(size_t)*CHAR_BIT-1) ) / (sizeof(size_t)*CHAR_BIT) ) * (sizeof(size_t)) to be more generic and fall directly in the implementation used, either size_t 32 or 64 bits. – ibra May 02 '21 at 01:32
  • 1
    @ibra - there is no guarantee that `vector` will use `size_t` for packing `bool` elements, I think. – Vlad Feinstein May 04 '21 at 16:42
  • yes I agree :). as i said in my pervious comment, your answer give how to round to get the number of sots used. hence at the end we need to multiply by the number of bytes in the used word size, 4 in 32 bits, and 8 in 64 bits. + the sizeof(vector). – ibra May 04 '21 at 17:45