30

I have a vector<bool> and I'd like to zero it out. I need the size to stay the same.

The normal approach is to iterate over all the elements and reset them. However, vector<bool> is a specially optimized container that, depending on implementation, may store only one bit per element. Is there a way to take advantage of this to clear the whole thing efficiently?

bitset, the fixed-length variant, has the set function. Does vector<bool> have something similar?

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
Adam
  • 16,808
  • 7
  • 52
  • 98
  • 5
    std::fill(v.begin(), v.end(), 0); – Kakalokia Oct 24 '13 at 09:58
  • 3
    @AliAlamiri: It's a normal way which OP mentioned it implicitly. and it may not take advantage of setting 8bits in an operation. – masoud Oct 24 '13 at 10:02
  • 1
    Have you timed it? It may well be algorithms such as `std::fill` are [specialized for `std::vector`](http://stackoverflow.com/questions/12433154/) in your standard library implementation. – Björn Pollex Oct 24 '13 at 10:13
  • 2
    One good way would also be to not use `vector` as it's, simply speaking, broken. – Bartek Banachewicz Oct 24 '13 at 10:19
  • Is it possible for you to not use `vector`? It's an unfortunate mistake we're stuck with for legacy purposes. – Jack Aidley Oct 24 '13 at 10:20
  • @BjörnPollex [they are for libc++](http://isocpp.org/blog/2012/11/on-vectorbool) – TemplateRex Oct 24 '13 at 13:26
  • @TemplateRex that's a great link. – Adam Oct 25 '13 at 05:15
  • 2
    While `std::vector` has unfortunate problems, the original rationale for adding it was exactly for problems like this. "Don't use it" is **not** an appropriate comment, then. – MSalters Oct 25 '13 at 13:29
  • 1
    @MSalters I think the point is that this type of structure should be named something other than `vector` because it doesn't have the same interface as other `vector`s. – Adam Oct 25 '13 at 18:02

8 Answers8

25

There seem to be a lot of guesses but very few facts in the answers that have been posted so far, so perhaps it would be worthwhile to do a little testing.

#include <vector>
#include <iostream>
#include <time.h>

int seed(std::vector<bool> &b) {
    srand(1);
    for (int i = 0; i < b.size(); i++)
        b[i] = ((rand() & 1) != 0);
    int count = 0;
    for (int i = 0; i < b.size(); i++)
    if (b[i])
        ++count;
    return count;
}

int main() {
    std::vector<bool> bools(1024 * 1024 * 32);

    int count1= seed(bools);
    clock_t start = clock();
    bools.assign(bools.size(), false);
    double using_assign = double(clock() - start) / CLOCKS_PER_SEC;

    int count2 = seed(bools);
    start = clock();
    for (int i = 0; i < bools.size(); i++)
        bools[i] = false;
    double using_loop = double(clock() - start) / CLOCKS_PER_SEC;

    int count3 = seed(bools);
    start = clock();
    size_t size = bools.size();
    bools.clear();
    bools.resize(size); 
    double using_clear = double(clock() - start) / CLOCKS_PER_SEC;

    int count4 = seed(bools);
    start = clock();
    std::fill(bools.begin(), bools.end(), false);
    double using_fill = double(clock() - start) / CLOCKS_PER_SEC;


    std::cout << "Time using assign: " << using_assign << "\n";
    std::cout << "Time using loop: " << using_loop << "\n";
    std::cout << "Time using clear: " << using_clear << "\n";
    std::cout << "Time using fill: " << using_fill << "\n";
    std::cout << "Ignore: " << count1 << "\t" << count2 << "\t" << count3 << "\t" << count4 << "\n";
}

So this creates a vector, sets some randomly selected bits in it, counts them, and clears them (and repeats). The setting/counting/printing is done to ensure that even with aggressive optimization, the compiler can't/won't optimize out our code to clear the vector.

I found the results interesting, to say the least. First the result with VC++:

Time using assign: 0.141
Time using loop: 0.068
Time using clear: 0.141
Time using fill: 0.087
Ignore: 16777216        16777216        16777216        16777216

So, with VC++, the fastest method is what you'd probably initially think of as the most naive -- a loop that assigns to each individual item. With g++, the results are just a tad different though:

Time using assign: 0.002
Time using loop: 0.08
Time using clear: 0.002
Time using fill: 0.001
Ignore: 16777216        16777216        16777216        16777216

Here, the loop is (by far) the slowest method (and the others are basically tied -- the 1 ms difference in speed isn't really repeatable).

For what it's worth, in spite of this part of the test showing up as much faster with g++, the overall times were within 1% of each other (4.944 seconds for VC++, 4.915 seconds for g++).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • It's amazing at the relative differences between the two platforms. Only `fill` appears in the faster group both times. I think the takeaway message here is that `vector` is unreliable. – Adam Oct 25 '13 at 05:17
  • 2
    "unreliable" suggests that it doesn't work. It works, everywhere, and you apparently can count on a memory-efficient implementation too. The one thing you can't count on is speed. – MSalters Oct 25 '13 at 14:50
  • @MSalters: except that it isn't a fully conforming vector (STL container) in all regards... However, +1 for summarising concisely what most people need to know. – DarthGizka May 13 '16 at 14:16
21

Try

v.assign(v.size(), false);

Have a look at this link: http://www.cplusplus.com/reference/vector/vector/assign/

Or the following

std::fill(v.begin(), v.end(), 0)
Kakalokia
  • 3,191
  • 3
  • 24
  • 42
  • 2
    "Try" and "may work"; two guesses in one phrase; Not a very high quality answer. Especially in C++, such answers can be non-portable or even dangerous. – Sebastian Mach Oct 24 '13 at 10:01
  • 9
    @phresnel by try I mean I'm confident that the code will work. The second code I wasn't too sure of. Nothing that deserves a downvote :) – Kakalokia Oct 24 '13 at 10:04
  • 1
    Note gcc implementation of assign actually do a std::fill http://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=libstdc%2B%2B-v3/include/bits/stl_bvector.h;hb=HEAD – log0 Oct 24 '13 at 10:09
  • 1
    @log0 I'm not 100% sure, but it looks to me like that `std::fill` in `assign` is over the container words, not individual bits. So this is likely an optimized way, at least for GCC. Jerry's VC tests are backwards. – Adam Oct 25 '13 at 05:21
10

You are out of luck. std::vector<bool> is a specialization that apparently does not even guarantee contiguous memory or random access iterators (or even forward?!), at least based on my reading of cppreference -- decoding the standard would be the next step.

So write implementation specific code, pray and use some standard zeroing technique, or do not use the type. I vote 3.

The recieved wisdom is that it was a mistake, and may become deprecated. Use a different container if possible. And definitely do not mess around with the internal guts, or rely on its packing. Check if you have dynamic bitset in your std library mayhap, or roll your own wrapper around std::vector<unsigned char>.

jrok
  • 54,456
  • 9
  • 109
  • 141
Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • 1
    This doesn't directly answer the question, but your choice 3 is probably the right path forward. Use a container that does the same thing but is made for it from the start. @TemplateRex's link shows the merits of a bit vector when done right (and wrong): http://isocpp.org/blog/2012/11/on-vectorbool – Adam Oct 25 '13 at 05:20
8

I ran into this as a performance issue recently. I hadn't tried looking for answers on the web but did find that using assignment with the constructor was 10x faster using g++ O3 (Debian 4.7.2-5) 4.7.2. I found this question because I was looking to avoid the additional malloc. Looks like the assign is optimized as well as the constructor and about twice as good in my benchmark.

unsigned sz = v.size(); for (unsigned ii = 0; ii != sz; ++ii) v[ii] = false;
v = std::vector(sz, false); // 10x faster
v.assign(sz, false); >      // 20x faster

So, I wouldn't say to shy away from using the specialization of vector<bool>; just be very cognizant of the bit vector representation.

rwp
  • 81
  • 1
  • 1
  • Thanks for the solution. Can someone tell me why `assign` is so much faster than the loop? Internally it must loop over it anyway to set the value, right? – Wikunia Oct 14 '20 at 10:23
7

Use the std::vector<bool>::assign method, which is provided for this purpose. If an implementation is specific for bool, then assign, most likely, also implemented appropriately.

Andrii
  • 1,788
  • 11
  • 20
5

If you're able to switch from vector<bool> to a custom bit vector representation, then you can use a representation designed specifically for fast clear operations, and get some potentially quite significant speedups (although not without tradeoffs).

The trick is to use integers per bit vector entry and a single 'rolling threshold' value that determines which entries actually then evaluate to true.

You can then clear the bit vector by just increasing the single threshold value, without touching the rest of the data (until the threshold overflows).

A more complete write up about this, and some example code, can be found here.

Thomas Young
  • 193
  • 3
  • 9
4

It seems that one nice option hasn't been mentioned yet:

auto size = v.size();
v.resize(0);
v.resize(size);

The STL implementer will supposedly have picked the most efficient means of zeroising, so we don't even need to know which particular method that might be. And this works with real vectors as well (think templates), not just the std::vector<bool> monstrosity.

There can be a minuscule added advantage for reused buffers in loops (e.g. sieves, whatever), where you simply resize to whatever will be needed for the current round, instead of to the original size.

DarthGizka
  • 4,347
  • 1
  • 24
  • 36
0

As an alternative to std::vector<bool>, check out boost::dynamic_bitset (https://www.boost.org/doc/libs/1_72_0/libs/dynamic_bitset/dynamic_bitset.html). You can zero one (ie, set each element to false) out by calling the reset() member function.

Like clearing, say, std::vector<int>, reset on a boost::dynamic_bitset can also compile down to a memset, whereas you probably won't get that with std::vector<bool>. For example, see https://godbolt.org/z/aqSGCi

tdp2110
  • 281
  • 3
  • 12