0

I have seen these questions:

Weirdness of the reserve() of vector

Is accessing the raw pointer after std::vector::reserve safe?

How reserve in std::vector works + Accessing vector with []

And a few others. But all of them deal with accessing the elements outside of the reserved space. I am interested in those strictly inside.

For example:

#include <iostream>
#include <vector>

int main()
{
  std::vector<int> a;
  a.reserve(3);
  a[0] = 4;
  std::cout << a[0] << ',' << a[1] << ',' << a[2] << '\n';
  std::cout << *(a.data()) << '`' << *(a.data() + 1) << '`' << *(a.data() + 2) << '\n';
  a[2] = 7;
  for(int &i: a)
    std::cout << i << ',';
  std::cout << '\n';
  std::cout << a[0] << ',' << a[1] << ',' << a[2] << '\n';
  std::cout << *(a.data()) << '`' << *(a.data() + 1) << '`' << *(a.data() + 2) << '\n';
  return 0;
}

This prints:

4,0,0
4`0`0

4,0,7
4`0`7

The empty line is the output of the for, and it makes sense: I only reserved memory, the vector considers there is no data.

I've been playing with this for an hour already, always staying within the confined space, and it never once crashed. I added -fsanitize=address -Wall -Wpedantic, no complaints (also on SO, but I lost the link). Also notice that I am directly dereferencing the data(), and it seems to be fine with it. So I have to wonder, is this undefined behavior?

I suppose the code above will make some cringe (I can't tell), but prettiness is not my goal with this -- it's just a personal goal.

To be more specific, I was trying to convert a Fortran eigenvalue program, but I know maybe two things in Fortran, rounded up, and while switching back and forth between the browser and the compiler, I stumbled across std::vector reserve() and push_back() is faster than resize() and array index, why? and a few similar others. And, sure enough, it works, but when I tried to use [] instead of push_back(), or insert(), it went even faster, a lot faster. I know this is a bit of a premature optimization, but I'd rather put out the fire now, while it's hot, rather than later.

So, here I am.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
a concerned citizen
  • 787
  • 2
  • 9
  • 25
  • `a.reserve(3);` should be `a.resize(3);` – Eljay Feb 28 '20 at 23:42
  • 3
    Undefined behavior only very rarely crashes. The vast majority of the time it seems to actually work. [Until one day it randomly physically breaks your printer](https://nwn.blogs.com/nwn/2018/06/windows-second-life-yoz-linden-lab.html) – Mooing Duck Feb 28 '20 at 23:42
  • Yep. UB. You have memory allocated, but no object is in that memory yet. They're `int`'s and it doesn't get much simpler than an `int` so it's probably going to "work", but still illegal. – user4581301 Feb 28 '20 at 23:42
  • @MooingDuck That was funny, it's also not something I'd make official, but I couldn't help wondering about it since it never crashed. Maybe I need a printer? – a concerned citizen Feb 28 '20 at 23:46
  • @user4581301 I just tried with `double`, and `4.1` and `7.5` as values, and it still doesn't crash, except that instead of zero for untouched elements (for `int`), it ouputs randomly some low value, `-1.83255e-06`, or similar. – a concerned citizen Feb 28 '20 at 23:50
  • 3
    Try using a vector of something with a constructor, like `std::string`. Then you are far more likely to have Bad Stuff happen. – 1201ProgramAlarm Feb 28 '20 at 23:50
  • @1201ProgramAlarm Yes, more like it! :-) I was using `-fsanitizer=address` ([edit] funny enough, without it it works). Well, if the answer below (and the comments) didn't make it clear enough, this does it. Still, it's a pity, because the speed was almost 10x less, sometimes. – a concerned citizen Feb 28 '20 at 23:54
  • 1
    Side note: If you want a RAII-wrapped dynamically allocated block of memory, won't resize it, and for one reason or another `vector` cannot be used (which is rare) `std::unique_ptr` may fill the gap. Otherwise `vector::resize` and put up with the `int`s being initialized. This is usually very cheap. – user4581301 Feb 28 '20 at 23:57
  • 2
    Just noticed the 10x less bit. There is overhead in `push_back`, especially when the `vector` must be resized (reserve first, if you have a good guess as to the size). Can we assume the 10x came with optimizations enabled? If not 10x is probably the least of your worries. – user4581301 Feb 29 '20 at 00:03
  • The second question you linked seems to answer your question. I don't see any difference to yours, really. Here is another one: [reserve() - data() trick on empty vector - is it correct?](https://stackoverflow.com/questions/59421717/reserve-data-trick-on-empty-vector-is-it-correct) – walnut Feb 29 '20 at 01:16
  • @user4581301 10x was one of the greatest, it stood up, so to speak, on average was ~7x~8x (no optimizations). And I only used `.reserve()` + `.puch_back()`, but I also noticed that it gets slower with larger sizes (1000+), whereas `[]` seemed to maintain the speed. Since this is an eigenvalue problem (orders may get high-ish), this was a very relevant discovery, worth checking. But, UB is UB. – a concerned citizen Feb 29 '20 at 09:21
  • @walnut The 2nd answer was a bit vague, but what I understood is that you resize to 1, then reserve to 100, and then you try to get past it with pointer voodoo. But this part in the marked answer, *the standard doesn't say what the vector implementation can do with the storage between `size()` and `capacity()`* seemed a bit against the conclusion. It seemed a bit vague to me. The 2nd question is more on point, it also mentions a workaround with `std::unique_ptr`, but if I ever get to have to use that, I'll probably need to rethink my approach. This was just curiosity. – a concerned citizen Feb 29 '20 at 09:28
  • 1
    @aconcernedcitizen You *must* enable optimizations if you want to benchmark any code. Without optimizations enabled any timing results are completely useless. – walnut Feb 29 '20 at 10:49
  • @walnut I remember I used optimizations, too, and they weren't that different. I don't remember numbers now, but the differences were still large. Then, somewhere in the middle of things, I must have not pressed the up arrow key enough times to bring back the proper `g++` command line, and just stuck with it. – a concerned citizen Feb 29 '20 at 11:45
  • The link duplicate question has the correct answer to this question, but the accepted answer here is incorrect. `int` is a POD, and the objects in the reserved memory are *in lifetime*. Thus, we can read/write the `int` objects, even if they are uninitialized. See https://stackoverflow.com/a/69141237/2791230 for detail. In fact, I think there are no UBs in accessing reserved POD objects, even if the Standard forbiddens the usage of `data()` out of `size()`. – wpzdm Sep 21 '21 at 05:09
  • @wpzdm You're right, it's a better answer, but even this one did it for me because I realized that the UB is due to the vector being empty (i.e. `v.size() = 0`). The comments below the answer also contributed. – a concerned citizen Sep 21 '21 at 06:59

1 Answers1

1

Your program has undefined behavior because it reads uninitialized memory. reserve reserves the space but does not initialize it.

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • I know about the differences between `reserve()` and `resize()`, but the consistency of the output made me wonder. So then, this should really *not* be used or tried, for anything more than amusement, I guess? – a concerned citizen Feb 28 '20 at 23:45
  • 1
    @aconcernedcitizen Indeed. You may find it consistently acting a certain way, but since you're in UB land anything could happen. – Ted Lyngmo Feb 28 '20 at 23:47
  • 2
    @aconcernedcitizen there are motions being made to allow simple data like `int` to ignore some of the formal initialization rules needed for more complex types, but who can say if they will go anywhere. Even then the `vector`'s `size` will still be zero and writing past `size` is verboten. – user4581301 Feb 28 '20 at 23:49
  • @TedLyngmo `and writing past size is verboten` This really clicked, writing past *size*, not allocated space. I'll let some time pass, to see others as well, but it looks like this will be the answer. Thank you. – a concerned citizen Feb 28 '20 at 23:51
  • @aconcernedcitizen That's right. Writing beyond the last element is not allowed and even if it was, I can't see what it could be used for practically. – Ted Lyngmo Feb 29 '20 at 00:02
  • 1
    @TedLyngmo `std::vector` does not offer any method to obtain default-initialized elements. So people seem to try to abuse the capacity concept, [here](https://stackoverflow.com/questions/59421717/reserve-data-trick-on-empty-vector-is-it-correct) is another possible duplicate with that motivation. – walnut Feb 29 '20 at 01:13
  • 2
    @aconcernedcitizen Lets take a `vector` and write past the size, but within the capacity. Consider what happens when you copy a vector. Consider what happens when other elements get inserted. Consider what happens for non trivial types of elements like `std::vector`. All of these scenarios are likely to cause the UB to manifest as undesirable behavior. Now consider all the use cases that we haven't imagined. – François Andrieux Feb 29 '20 at 01:20
  • @walnut That question was not in the list when I wrote the question, and haven't seen it when I searched before. It does answer, yes, and I've marked it as such. The key here is `.size()`, which must be the universal check behind the doors for `std::vector`. I was hoping this might not be UB, but, alas. – a concerned citizen Feb 29 '20 at 09:16
  • @FrançoisAndrieux Now it makes sense because now I know about the boss, `.size()`. Until now I thought maybe there's some janitor that can be bribed. There isn't, but, clearly, the janitor can be fooled. However, fooling is not reliable, today it may work, tomorrow it may not. – a concerned citizen Feb 29 '20 at 09:31