Faster alternative to push_back(size is known)

Question

I have a float vector. As I process certain data, I push it back.I always know what the size will be while declaring the vector.

For the largest case, it is 172,490,752 floats. This takes about eleven seconds just to push_back everything.

Is there a faster alternative, like a different data structure or something?

[std::vector::reserve](http://en.cppreference.com/w/cpp/container/vector/reserve) — François Andrieux, Jul 05 '17 at 18:59
@FrançoisAndrieux this still takes like 6 seconds :( is this one of those things I'll just have to accept? — anc, Jul 05 '17 at 19:08
That's about 35 nano seconds per `push_back` (for 172,490,752). You *are* copying over a hundred million floats. — François Andrieux, Jul 05 '17 at 19:10
@anc What are you running this on, and are you running a release build? Even with an online compiler such as [ideone](http://ideone.com/2XWATN) or [coliru](http://coliru.stacked-crooked.com/a/437a79ad304caaf0), the numbers should be much better than six seconds. What *else* are you doing while pushing floats into a vector? (i.e. where are they coming from?). — WhozCraig, Jul 05 '17 at 19:17
might be a very stupid question so apologies but would adding a couple threads to do `push_back` help with speed? if the order doesn't matter? — PYA, Jul 05 '17 at 19:19
@pyjg It wouldn't work. You would have race conditions all over — Justin, Jul 05 '17 at 19:21
@Justin Yes thats why I said IF the order of insertion doesnt matter. — PYA, Jul 05 '17 at 19:21
@pyjg The only way you could make that work is if you created the vector to the correct size and had the threads only access sections of the vector. — Justin, Jul 05 '17 at 19:21
@pyjg `push_back` isn't atomic, so you can get things much worse than out-of-order insertion — Justin, Jul 05 '17 at 19:22
@Justin could you please explain what all could go wrong? I thought if you do a `reserve` and add threads to `push_back` then yes the order is wrong but what else can happen that is bad? Thanks :) — PYA, Jul 05 '17 at 19:24
@pyjg With 2 threads A and B. A and B call `V.push_back()` with `1` and `2` respectively. In A's perspective, `V.size() == 0`, so it inserts `1` into position `0`. In B's perspective, since it is executing at the same time as A, `V.size() == 0`, so it inserts `2` into position `0` (in reality, it could even be worse because different bytes might be written in and you might get a completely different number; depends on if writing a `float` is atomic). Meanwhile, A increments `V.size` and B does so too. After this, `V.size()` could be `1`, `2`, or something else, and `V[0]` is unknown as well — Justin, Jul 05 '17 at 19:28
@pyjg Suppose `push_back` checks what the last element is, then assigns to it, then increments it. If you have multiple threads going at once you could get multiple assignments to the same address and/or skipped entries, and probably worse things are possible in a real implementation. — Daniel H, Jul 05 '17 at 19:28
@anc Do you know the size of the vector at compile time, or only run time? — Daniel H, Jul 05 '17 at 19:29
@pyjg The basic problem is a [race condition](https://stackoverflow.com/questions/34510/what-is-a-race-condition). Once you are aware of the definition it's easier to spot them. — Blastfurnace, Jul 05 '17 at 19:35
@Blastfurnace I knew it had to be a race condition but didnt realize exactly WHAT it would be. Thanks! — PYA, Jul 05 '17 at 19:36
close match: https://stackoverflow.com/questions/15952412/performance-degradation-due-to-default-initialisation-of-elements-in-standard-co — Walter, Jul 06 '17 at 16:02

Jesper Juhl · Accepted Answer · 2023-05-17T09:50:10.317

11

If you know the final size, then reserve() that size after you declare the vector. That way it only has to allocate memory once.

Also, you may experiment with using emplace_back() although I doubt it will make any difference for a vector of float. But try it and benchmark it (with an optimized build of course - you are using an optimized build - right?).

Another option when you know the size in advance is to use std::array.

edited May 17 '23 at 09:50

answered Jul 05 '17 at 19:11

Jesper Juhl

30,449
3
47
70

2

`push_back` and `emplace_back` may still be slower than raw access because they have to check if there is need for reallocation, even if there isn't. The performance penalty may be slight, but it is there ;) – Tobias Ribizel Jul 05 '17 at 19:25
1

@Tobias maybe. Benchmarking the two solutions would be the way to tell. – Jesper Juhl Jul 05 '17 at 19:34
1

I doubt `emplace_back` is any faster here, because floats are [trivially copyable](http://en.cppreference.com/w/cpp/concept/TriviallyCopyable) – Daniel H Jul 06 '17 at 18:07
@Daniel H: isn't that what I already said when I said "I doubt it will make any difference for a vector of float"? The fact that it is trivially copyable was sort of implied.. – Jesper Juhl Jul 06 '17 at 18:16
1

The implication is the other way around: because it is trivially copyable, there’s no difference between a copy and a move, so it won’t make any difference. – Daniel H Jul 06 '17 at 18:32

Mark Ransom · Answer 2 · 2017-07-06T16:13:50.307

The usual way of speeding up a vector when you know the size beforehand is to call reserve on it before using push_back. This eliminates the overhead of reallocating memory and copying the data every time the previous capacity is filled.

Sometimes for very demanding applications this won't be enough. Even though push_back won't reallocate, it still needs to check the capacity every time. There's no way to know how bad this is without benchmarking, since modern processors are amazingly efficient when a branch is always/never taken.

You could try resize instead of reserve and use array indexing, but the resize forces a default initialization of every element; this is a waste if you know you're going to set a new value into every element anyway.

An alternative would be to use std::unique_ptr<float[]> and allocate the storage yourself.

user7860670 · Answer 3 · 2017-07-05T19:18:26.450

1

::boost::container::stable_vector Notice that allocating a contiguous block of 172 *4 MB might easily fail and requires quite a lot page joggling. Stable vector is essentially a list of smaller vectors or arrays of reasonable size. You may also want to populate it in parallel.

edited Jul 05 '17 at 19:18

answered Jul 05 '17 at 19:11

user7860670

35,849
4
58
84

Tobias Ribizel · Answer 4 · 2017-07-08T13:55:43.167

1

I have two answers for you:

As previous answers have pointed out, using reserve to allocate the storage beforehand can be quite helpful, but:
push_back (or emplace_back) themselves have a performance penalty because during every call, they have to check whether the vector has to be reallocated. If you know the number of elements you will insert already, you can avoid this penalty by directly setting the elements using the access operator []

So the most efficient way I would recommend is:

Initialize the vector with the 'fill'-constructor:
```
std::vector<float> values(172490752, 0.0f);
```
Set the entries directly using the access operator:
```
values[i] = some_float;
++i;
```

edited Jul 08 '17 at 13:55

answered Jul 05 '17 at 19:23

Tobias Ribizel

5,331
1
18
33

3

This would require benchmarks; it's totally possible that this would be *slower*, as the vector would have to initialize every element, then overwrite it later. – Justin Jul 05 '17 at 19:24
If it were possible to initialize the storage without filling the vector explicitly, this would probably be the fastest solution ;) I'll check the assembly output first – Tobias Ribizel Jul 05 '17 at 19:27
@Tobias How would you initialize the storage without filling it? Do you mean allocating the storage without filling it? – François Andrieux Jul 05 '17 at 19:28
1

@FrançoisAndrieux I think he means that if `std::vector::resize` didn't default initialize, for instance. Basically if it behaved like `reserve` but the length of the vector was also set to be the length of the capacity. – Justin Jul 05 '17 at 19:30
@Justin Thank you for the clarification. – François Andrieux Jul 05 '17 at 19:30
I doubt there’s a way that does that for all types, but it would be nice if there were a method to [default initialize](http://en.cppreference.com/w/cpp/language/default_initialization) a bunch of elements instead of initializing them to a specific value; for `float`s that would be a no-op. It looks like allocators in general don’t have that option, though, so it’s no wonder `vector` doesn’t. – Daniel H Jul 05 '17 at 20:47

score 1 · Answer 5 · answered Jul 06 '17 at 16:02

You could use a custom allocator which avoids default initialisation of all elements, as discussed in this answer, in conjunction with ordinary element access:

const size_t N = 172490752;
std::vector<float, uninitialised_allocator<float> > vec(N);
for(size_t i=0; i!=N; ++i)
    vec[i] = the_value_for(i);

This avoids (i) default initializing all elements, (ii) checking for capacity at every push, and (iii) reallocation, but at the same time preserves all the convenience of using std::vector (rather than std::unique_ptr<float[]>). However, the allocator template parameter is unusual, so you will need to use generic code rather than std::vector-specific code.

Daniel H · Answer 6 · 2017-07-06T18:05:50.547

The reason push_back is slow is that it will need to copy all the data several times as the vector grows, and even when it doesn’t need to copy data it needs to check. Vectors grow quickly enough that this doesn’t happen often, but it still does happen. A rough rule of thumb is that every element will need to be copied on average once or twice; the earlier elements will need to be copied a lot more, but almost half the elements won’t need to be copied at all.

You can avoid the copying, but not the checks, by calling reserve on the vector when you create it, ensuring it has enough space. You can avoid both the copying and the checks by creating it with the right size from the beginning, by giving the number of elements to the vector constructor, and then inserting using indexing as Tobias suggested; unfortunately, this also goes through the vector an extra time initializing everything.

If you know the number of floats at compile time and not just runtime, you could use an std::array, which avoids all these problems. If you only know the number at runtime, I would second Mark’s suggestion to go with std::unique_ptr<float[]>. You would create it with

size_t size = /* Number of floats */;
auto floats = unique_ptr<float[]>{new float[size]};

You don’t need to do anything special to delete this; when it goes out of scope it will free the memory. In most respects you can use it like a vector, but it won’t automatically resize.

Faster alternative to push_back(size is known)

6 Answers6