2

I was learning about the emplace() of std::vector and stumble upon this code:

// vector::emplace
#include <iostream>
#include <vector>

int main ()
{
  std::vector<int> myvector = {10,20,30};

  auto it = myvector.emplace ( myvector.begin()+1, 100 );
  myvector.emplace ( it, 200 );
  myvector.emplace ( myvector.end(), 300 );

  std::cout << "myvector contains:";
  for (auto& x: myvector)
    std::cout << ' ' << x;
  std::cout << '\n';

  return 0;
}

I am wondering why in the for loop they use a reference auto& x instead of a simple copy, I tried without the & and it worked the same, is this a security to avoid a copy or a a performance trick ?

Saxtheowl
  • 4,136
  • 5
  • 23
  • 32
  • 3
    "avoid a copy or a performance trick ?" Both, as not copying (usually) improves the performance – Fatih BAKIR May 31 '19 at 19:59
  • 2
    @FatihBAKIR Dereferencing isn't free either. Sometimes copying is faster, registers are fast. You have to measure. – François Andrieux May 31 '19 at 20:00
  • It's already loaded the memory for the vector, how would it be more expensive to use it than to copy it? – xaxxon May 31 '19 at 20:01
  • 1
    @xaxxon I don't know because I haven't measured it. The only thing I know is that intuition is barely better than a coin toss (if that) at guessing about performance. – François Andrieux May 31 '19 at 20:02
  • 1
    @FrançoisAndrieux you said "Sometimes copying is faster" - That is a statement of fact. I asked how you know that can happen for sure in some circumstances with a loop like this. – xaxxon May 31 '19 at 20:03
  • @FrançoisAndrieux, I know that references are sometimes more expensive than copying small trivial types (`char`) when passing arguments to functions. However, I don't see how that can apply in a loop. As to copy, you have to dereference first anyway. – Fatih BAKIR May 31 '19 at 20:07
  • @FatihBAKIR exactly – xaxxon May 31 '19 at 20:07
  • @xaxxon Because references require you to eventually dereference like pointers (on all hardware I can think of anyway). See these questions : [1](https://stackoverflow.com/questions/26552481/which-is-faster-pass-by-reference-vs-pass-by-value-c) [2](https://stackoverflow.com/questions/40185665/performance-cost-of-passing-by-value-vs-by-reference-or-by-pointer) [3](https://stackoverflow.com/questions/26387569/rule-of-thumb-for-when-passing-by-value-is-faster-than-passing-by-const-referenc) [4](https://stackoverflow.com/questions/22840120/pass-by-value-faster-than-pass-by-reference) – François Andrieux May 31 '19 at 20:08
  • @FrançoisAndrieux no, it doesn' have to dereferencet. If you know what register it's in, you just reference the register directly. The reference even exists in the compiled code. To be fair, even if you did make a copy, that copy probably wouldn't exist, either, as it would be optimized out and it would just use the "reference" anyhow. unless the copy had side effects (on a more complex type) – xaxxon May 31 '19 at 20:09
  • 1
    @FrançoisAndrieux, yes, but those are related to _passing_ stuff to functions, and the ABI usually requires passing a pointer, which incurs the indirection overhead. However, here the loop is visible to the compiler and it doesn't have to perform that. – Fatih BAKIR May 31 '19 at 20:10
  • @xaxxon You can't make the determination for every configuration of every compiler. That's why you need to actually measure in real life for your own use case if there's a doubt. – François Andrieux May 31 '19 at 20:10
  • @FrançoisAndrieux You don't need to know 100% in order to understand what's going to happen. You don't need to measure this. If you have a compiler that doesn't do this, you change *compilers* not code. – xaxxon May 31 '19 at 20:12
  • @xaxxon You're right that "Sometimes copying is faster" is a statement of fact, and means that copying will not certainty *not* be faster. Like I said you can't determine that no compiler will produce faster code with a copy. And you can't determine that OP won't use such a compiler, because no information was given about that. People aren't legally required to use compiler with good optimizers. Edit : Though I'll admit I made my comment thinking of function arguments. It's may be less widely applicable in this case, but it's not incorrect. – François Andrieux May 31 '19 at 20:19
  • @FrançoisAndrieux You're basically telling people to wake up every morning and test and make sure gravity is still working. It's a complete waste of time. Just because it's possible doesn't mean it's good advice. – xaxxon May 31 '19 at 20:20
  • @xaxxon Well no, I don't want everyone to test if this is faster. Only those who want to know, people like OP who explicitly ask if there's a performance implication. – François Andrieux May 31 '19 at 20:21
  • @FrançoisAndrieux you're just wasting people's time is the point. Pedantically if you're on the worst compiler ever you might see the copy be faster, but that's not going to happen. For someone asking questions about fundamental parts of the language, it's the opposite of helpful to give advice to them which only might hold up in a theoretical situation which probably doesn't exist even in the darkest corner of compiler land – xaxxon May 31 '19 at 20:24
  • @xaxxon I'm sorry, but I couldn't disagree more. If there's one rule of thumb about performance then it's to always measure and never assume. This is specially important for beginners to understand. It's not about accounting for the possibility of obscure compilers, it's about avoiding making assumptions about performances. Realizing that compilers can do crazy things to your code instead of being surprised by it. If you aren't *sure* about performance impacts, you need to become sure. Even then, better to check anyway. Since OP asked the question, there's a doubt so it's worth measuring. – François Andrieux May 31 '19 at 20:28
  • @FrançoisAndrieux You could at least preface it with "I can't come up with a single example of how it could be slower..." and "on no architecture that I know of does it make sense for a compiler to generate code where making a copy would be faster" to make it clear that you don't actually have any specific case in mind. Or maybe "I just tried this on all 3 major compilers and they all generated code that didn't result in any extra memory accesses by not making a copy" to set the tone of how likely one is to need to measure this. – xaxxon May 31 '19 at 20:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/194268/discussion-between-francois-andrieux-and-xaxxon). – François Andrieux May 31 '19 at 20:31

2 Answers2

3

The other difference between auto and auto& in this context is that auto& will let you modify the value in the vector. This may be an undesirable bug just waiting to happen. Ideally, if you are going to take a reference only for reading, you should take a const reference: const auto &

The benefit of using the reference when the vector contains objects that are more than a fundamental numeric or pointer type is it won't copy the whole object to a temporary. If the object has any deep copy semantics, or is perhaps a shared_ptr then there may be significant overhead that is totally avoided.

For a fundamental type the copy is usually very fast, so a single copy is preferred, but you can expect the compiler optimiser to do the "right thing" if asked to reference a fundamental and then use that reference numerous times, so for template programming you should favour the const-ref over the copy to keep the code simple when you don't know the type.

Gem Taylor
  • 5,381
  • 1
  • 9
  • 27
1

It's as simple as you said, it would be a copy. So it's indeed a performance trick, but for an int it won't be any faster, it might be even be slower. But had you had a std::vector<std::string> with million elements then it would make a big difference. You can try it yourself.

But, it's needed if you want to modify the contents of the iterated container. Without the reference, you would be changing the copy, not the element inside the container. The difference would be seen here:

std::vector<int> numbers1 = {1,2,3,4};
std::vector<int> numbers2 = {1,2,3,4};
for(auto& x: numbers1) ++x;
for(auto x: numbers2) ++x;
assert(numbers1!=numbers2); // True

Also I would recommend using auto&& instead of auto& because it will work better with temporaries, see e.g. this answer.

Quimby
  • 17,735
  • 4
  • 35
  • 55
  • 1
    " it might be even be slower." how? You have to load the contents of the vector from memory to make a copy, but once it's loaded, why would you put it somewhere else to make it faster? – xaxxon May 31 '19 at 20:06
  • @xaxxon "You have to load the contents of the vector from memory to make a copy" Yes, exactly, but you did it only once. Then you work with your own copy in a register or at least on the stack. If you use a reference/pointer then each access must dereference that address unless it can be proven that the value did not change between those instructions. That might not be easy to do for longer functions or even just `void foo(int&, int&)` makes it impossible without knowing the calling code. – Quimby May 31 '19 at 20:12
  • Each access doesn't have to dereference it if it knows where it is. Then it just points to the register directly. No actual dereferencing happens. if it changes, its new value is still sitting in that same register. Putting it in a temporary variable doesn't change anything except maybe it needs to do more loads because you've increased the working set size if your copy can't be optimized away – xaxxon May 31 '19 at 20:17
  • @xaxxon But the calling code cannot know where it is. If you compile `foo` into `foo.obj` then `foo` has no idea how it gets called. It only knows the calling convention and I don't know enough about them, but I never heard of "this pointer points to a register" parameter type. This is also relevant to the for loop, it cannot in general know where the vector came from or who has access to it. Any function that is called in that loop might have side-effects that involve changing that vector. – Quimby May 31 '19 at 20:26
  • I am confident that in the code posted above the compiler will do the best thing and all I'm saying that there are not so uncommon cases where it might lead to different set of instructions that are slower. Not saying that it will be noticable and of course link-time optimization will handle a lot of cases too. – Quimby May 31 '19 at 20:27
  • This code has nothing to do with calling conventions. If you call something you can make a copy of it then if needed. Making an early copy does nothing except potentially harm you. – xaxxon May 31 '19 at 20:31
  • @xaxxon The copy only harms you if it's actually made. Just like the reference only harms you if the dereference actually happens. Either way it's an implementation detail. As long as the behavior between the two is identical, it may very well produce identical assembly. – François Andrieux May 31 '19 at 20:39
  • 1
    *"I would recommend using auto&& instead of auto& because it will work better with temporaries"* - Not when trying to modify the container. You *don't* want temporaries in that instance, and it's better to get a nice shining compiler error if you get them unexpectedly. – StoryTeller - Unslander Monica May 31 '19 at 20:39
  • @xaxxon As I said it will affect for loops too. If you don't know who has the access to the container, you cannot just put it in register an work with that. If the loop calls a function that has a side-effect of changing that container, then your cached value might not be up-to-date. So compiler cannot cache it, but has to dereference the address on each access as I said. The same is with `volatiles`, they can be optimized even less than references because compiler can't prove anything about the value of the variable. But it can prove a lot about local variable on stack. – Quimby May 31 '19 at 20:43
  • @Quimby if an iterator is invalidated then there's nothing to dereference. And never does it have to "dereference on each access". That's silly. Once it knows where the object is, you're done with the dereferencing. And invalidating your iterator during a loop using iterators is a pretty bad idea. Comparing a volatile to a non-volatile is silly - that's an awful strawman of my argument. The iterator reference *is* a local stack variable. The compiler knows whether it's changing or not. – xaxxon May 31 '19 at 22:56
  • @xaxxon Doing e.g. `vector[5]+=1` does not invalidate any iterators but changes the container. I don't know how to get my point across, so here's some [code](https://godbolt.org/z/9bfgOu) – Quimby Jun 01 '19 at 11:03
  • Yes, if you have to have a copy then you have to have a copy. But that's a different use case than *choosing* to have copy. – xaxxon Jun 01 '19 at 18:52
  • @xaxxon I'm afraid I really don't understand. This whole question is about whether to *choose* to copy or not. The code I linked can choose and that choice has consequences. It won't generate the same code in general and it may be slower. That's what started this discussion, is it not? – Quimby Jun 01 '19 at 19:01
  • it's not a choice to make a copy if you are required to make a change to a copy. If you have to make a change to an object and have it not change the object in the vector, then obviously you have to make a copy - that's not a choice. – xaxxon Jun 01 '19 at 19:21
  • What? The code I posted does not do any changes to the vector in the loop. Apply doesn't have to either. Without any changes to the vector, the code still has to be different. Apply only represents some nested code that the compiler cannot potentially see. I just don't know how to do multiple object files in godbolt. And I will not bother with making big enough example where "-O2" won't reach. – Quimby Jun 01 '19 at 19:30