Why is the postfix increment faster than the prefix one in the C++?

Question

I thought always, that it must be conversely. But when I tried this simple code, I got unexpected results:

#include <cstdlib>
#include <cstdio>

#include <iostream>

#include <chrono>

using namespace std;


int main(int argc, char* argv[])
{
  int x = 0, y = 0;
  double z;
  chrono::steady_clock::time_point start_point;

  start_point = chrono::steady_clock::now();

  for(int i = 0; i < 100000; x = ++i)
    for(int j = 0; j < 100000; y = ++j)
      z = static_cast<double>(x * y);

  cout << "The prefix increment took a " << chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now() - start_point).count() << " milliseconds" << endl;

  start_point = chrono::steady_clock::now();

  for(int i = 0; i < 100000; x = i++)
    for(int j = 0; j < 100000; y = j++)
      z = static_cast<double>(x * y);

  cout << "The postfix increment took a " << chrono::duration_cast<chrono::milliseconds>(chrono::steady_clock::now() - start_point).count() << " milliseconds" << endl;

  // To make the compiler happy...
  x = y = static_cast<int>(z / z);

  cout << "SUCCESS" << endl;
  return EXIT_SUCCESS;
}

The result of running of this code at my machine is:

The prefix increment took a 25716 milliseconds
The postfix increment took a 19119 milliseconds
SUCCESS

EDIT:

Yep, changing the z = ... to the z += ... leaded to that the execution times became equal.

So thank all of you for your answers.

You weren't "compelled" to place dummy text. You were notified there is not much of an explanation present. You *chose* to post dummy text to circumvent that. — StoryTeller - Unslander Monica, Mar 27 '18 at 08:14
My guess is that the second run is always faster than the first run. — Zang MingJie, Mar 27 '18 at 08:15
maybe related (C): https://stackoverflow.com/q/12190624/1132334 — Cee McSharpface, Mar 27 '18 at 08:16
You should at least isolate the time calculation so you're sure that you're not measuring the output time. — molbdnilo, Mar 27 '18 at 08:16
Micro-benchmarking things like that is notoriously difficult. — Sergey Kalinichenko, Mar 27 '18 at 08:16
The rule that prefix increment is faster only holds for proper objects were the postfix increment is implemented in terms of the prefix one. Int is a fundamental type and thus this rule does not apply here. — jotasi, Mar 27 '18 at 08:16
You could "_make the compiler happy_" by declaring x, y and z volatile. What compiler, platform and build options have you used - that information may be necessary to replicate your result. I would advise observing the disassembly of this code (in your debugger for example). That may answer your question. — Clifford, Mar 27 '18 at 08:17
I get similar numbers with this code. The numbers don't change much when running the postfix version first. The optimized version takes no time at all. — Karsten Koop, Mar 27 '18 at 08:18
Option 1/ optimization is disabled => your test is meaningless. Option 2/ optimization is enabled => loops are optimized away => your test is meaningless. — YSC, Mar 27 '18 at 08:18
very similar question: https://stackoverflow.com/questions/24901/is-there-a-performance-difference-between-i-and-i-in-c — Eziz Durdyyev, Mar 27 '18 at 08:20
Optimisations was turned off. Changing the order does not give anything. If I try to make this numbers in the cycles greater, then times grows too (it's about of time measuring; e.g., I got something about 103 seconds vs 70 seconds; so I think this is not a measuring error). — Serge Roussak, Mar 27 '18 at 08:23
@SergeRoussak Change `z =` to `z +=`, otherwise, the compiler can completely optimize away the loops. Print out `z` after the loops. Turn on optimizations. Boht loops then take almost the same time: https://wandbox.org/permlink/Jb1lFEvx1kpRcbHh — Daniel Langr, Mar 27 '18 at 08:26
Not only is this architecture dependant - but also compiler dependant. How do you know that on platform X the way to do an i++ can't be taken down to a single instruction while ++i is 400. — UKMonkey, Mar 27 '18 at 08:30
When I run your code I get both of them coming in right at 0 ms. You're not benchmarking un-optimized code are you? — xaxxon, Mar 27 '18 at 08:30
@xaxxon He's already said in the comments that yes, this is unoptimised. (Sadly he didn't update the question) — UKMonkey, Mar 27 '18 at 08:31
@xaxxon, you compiled with full optimisation, that's why you have got, what you have got. — Serge Roussak, Mar 27 '18 at 08:32
@UKMonkey Oh, sorry. I thought this was still an active question, but it seems like it's been solved. — xaxxon, Mar 27 '18 at 08:32
@SergeRoussak of course I did. It doesn't make any sense to time any other builds. That's why the results you got are meaningless. — xaxxon, Mar 27 '18 at 08:32
@SergeRoussak - And the fact you *haven't* compiled with optimizations makes your testing nonsensical, frankly. — StoryTeller - Unslander Monica, Mar 27 '18 at 08:33
@xaxxon don't say sorry - Question is missing a huge amount of information in it. :) — UKMonkey, Mar 27 '18 at 08:34
But this all said, although the OP is being, let's say, a little forthright, that doesn't make this a poor question. It's still well-presented with compilable code (however flawed), and documented output. +1 and voted to reopen. — Bathsheba, Mar 27 '18 at 08:38

Bathsheba · Answer 1 · 2018-03-27T08:27:56.860

15

There is no difference at all - any perceived difference is due to artifacts introduced by your testing technique.

Compilers have been optimising away i++ for years now (although I still use ++i out of habit). Don't test things like this - setting up a framework is too difficult. Trivialise the program and check the generated assembly instead.

Note also that on a platform with a 32 bit int (very common) the behaviour of your code is undefined due to int overflow (100,000 squared is larger than the 31^st power of 2). This renders your testing completely useless.

edited Mar 27 '18 at 08:27

answered Mar 27 '18 at 08:20

Bathsheba

231,907
34
361
483

1

Really?.. Do you think that 30 seconds (see my comment) are "no difference"? – Serge Roussak Mar 27 '18 at 08:26
9

@SergeRoussak: Yes your testing is baseless. Why do you continue to persist with this notion that you're correct and have made a discovery that i++ is faster than ++i? Come on, be serious! – Bathsheba Mar 27 '18 at 08:29
3

@SergeRoussak when I run your code I get them both taking 0 ms. Your testing is wrong. – xaxxon Mar 27 '18 at 08:30
@Bathsheba, 'cause I tested it with the same compiler, at the same platform etc... (The source is placed in the same translation unit, so two cycles can not be executed in the different conditions). If someone say that compilers optimize out the difference between such using of this operators, then we could expect an equal times of execution. But WHY one cycle runs slower approximately by 30% than other one does? – Serge Roussak Mar 27 '18 at 08:46
2

Hints: (1) fix the UB, (2) `cout << "The prefix increment took a " ` introduces *a lot* of overhead. (3) Do more with each value of x and y: perhaps *sum* them to `z`? (4) Copy out your two loops multiple times. (5) Switch the order of the runs. – Bathsheba Mar 27 '18 at 08:49
1

@SergeRoussak compiler optimize in *optimized* builds, not in *unoptimized* builds - since you clearly told the compiler not to optimize the code, why do you expect it to optimize it anyway? – UnholySheep Mar 27 '18 at 08:50

Thomas Flinkow · Answer 2 · 2018-03-27T12:48:29.187

7

Adding only few to what Bathsheba has already said, both

int i;
i++

and

int i;
++i

get compiled to

push rbp
mov rbp, rsp
add DWORD PTR [rbp-4], 1

where the important line incrementing the value is

add DWORD PTR [rbp-4], 1
^^^                   ^^^
relevant             parts

In answer to your comment regarding optimizations, above code was with optimizations off; using -O leads to

add DWORD PTR [rdi], 1
^^^                 ^^^
relevant           parts

for both i++ and ++i. I had to adjust the sample to

void F(int& i)
{
    ++i; // respectively i++
}

for it to be not optimized away totally, but the point is still the same.

I used gcc 7.3 x86-64. Test it yourself using the Online Compiler Explorer.

edited Mar 27 '18 at 12:48

answered Mar 27 '18 at 08:27

Thomas Flinkow

4,845
5
29
65

3

This is convincing (+1), but do mention the compiler specifics and target architecture here. – Bathsheba Mar 27 '18 at 08:29
@Bathsheba thank you for your feedback, I edited the answer now to address both your feedback and OP's statement about optimizations. – Thomas Flinkow Mar 27 '18 at 08:32
And do feel free to complete this answer by mentioning the UB in the code. – Bathsheba Mar 27 '18 at 08:33
1

@Bathsheba sorry, what do you mean by *"UB"*? Undefined behavior? – Thomas Flinkow Mar 27 '18 at 08:34
1

Absolutely͏͏͏͏͏͏͏͏͏͏͏͏͏ – Bathsheba Mar 27 '18 at 08:35

Why is the postfix increment faster than the prefix one in the C++?

2 Answers2