Which is faster? ++, += or x + 1?

Question

I am using C# (This question is also valid for similar languages like C++) and I am trying to figure out the fastest and most efficient way to increment. It isn't just one or two increments, in my game, its like 300 increments per second. Like the Frames of every sprite on the screen are incrementing, the speed and positions of my rpg character, the offset of the camera etc. So I am thinking, what way is the most efficient? e.g for incrementing 5 y_pos on every movement I can do:

1.

Player.YPos += 5;

2.

Player.YPos = Player.YPos + 5;

3.

for (int i = 0; i < 5; i++)
{
    Player.YPos++;
}

Which is the most efficient (and fastest)?

Is this the bottleneck in your code? *"Premature optimization is the root of all evil".* — Felix Kling, Jun 25 '11 at 13:17
Its just plain pseudocode given as an example to show my point — ApprenticeHacker, Jun 25 '11 at 13:18
I'm curious at what point anyone thought, "Hm, (1) looks a bit iffy, I better implement this as (3)."... — Kerrek SB, Jun 25 '11 at 13:31
@Kerrek LOL, i guess you're right. I should better just use += — ApprenticeHacker, Jun 25 '11 at 13:33
This is just a comment because it's answering the question you *might have asked*, which is "How do I make my code fast?" The answer is, not by guessing or spending brain cycles on tiny stuff. These statements take about as long as it takes this text to reach your retina. [You do it by having the program tell you where its time-drains are.](http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773) — Mike Dunlavey, Jun 25 '11 at 14:58
As a side note, if you're doing game development, you should consider making your background logic (where people are, what they're doing and what their current state is) separate from your display logic (rendering them on the screen, incrementing their animations.) It makes things much easier to swap out components and features later on. :) — corsiKa, Jun 26 '11 at 00:19
@Black Bear It took me more than a week to make a simple tile manager in C++ which I remade in C# in a day (2 days if you count debugging). So I AM NOT GOING BACK! Development time also matters you know. — ApprenticeHacker, Jun 26 '11 at 03:26
@ApprenticeHacker I think you fail to appreciate just how fast computers really are. You're concerned about 300 increments/sec, so I did some quick tests how many incs my 2 year old box could do in a second. The "slow" approach, a loop getting time each iteration to check if 1 second has passed: **6.4 mil** increments. The fast approach, just counting increments and checking time elapsed at the end: **400 million** increments in 0.971 seconds. — Disillusioned, Sep 23 '13 at 19:00

Jon Skeet · Accepted Answer · 2011-06-25T13:47:59.230

93

(Answer specific to C# as C++ may vary significantly.)

1 and 2 are equivalent.

3 would definitely be slower.

Having said that, doing this a mere 300 times a second, you wouldn't notice any difference. Are you aware of just how much a computer can do in terms of raw CPU+memory in a second? In general, you should write code for clarity as the most important thing. By all means worry about performance - but only when you have a way to measure it, in order to a) tell whether you need to worry, and b) whether any changes actually improve the performance.

In this case, I'd say that option 1 is the clearest, so that's what I'd use.

edited Jun 25 '11 at 13:47

answered Jun 25 '11 at 13:17

Jon Skeet

1,421,763
867
9,128
9,194

13

But a smart compiler could detect the lack of side-effects, unroll the loop then optimize the five increment ops into a single add, right? :) – Frédéric Hamidi Jun 25 '11 at 13:18
but isnt ++ supposed to produce less byte code (or machine code) ? – ApprenticeHacker Jun 25 '11 at 13:19
2

@burningprodigy: I'd expect an "add 5" to probably take as few cycles as an "increment by 1". But when in doubt, *test*. – Jon Skeet Jun 25 '11 at 13:20
8

@Frédéric a smart compiler might indeed recognise that but I'd hope that the people writing my compilers would be expending their efforts on optimisations that served useful purpose. – David Heffernan Jun 25 '11 at 13:20
2

@Frédéric: Assuming C# rather than C++, the C# compiler couldn't do that... the JIT compiler *might* be able to, but I doubt that it would. Why spend time looking for optimizations like that, when the developer should have just used a single "+= 5" to start with? – Jon Skeet Jun 25 '11 at 13:21
1

@burning why would the compiler treat `++` and `+=1` differently? – David Heffernan Jun 25 '11 at 13:24
@Jon Thanks. I guess, you are right, += is way better (and it will save a lot of development time instead of the ++ loop) – ApprenticeHacker Jun 25 '11 at 13:24
@David because ++ is a unary operator. – ApprenticeHacker Jun 25 '11 at 13:29
17

@Jon, @David, for what it's worth I tried the loop with g++ 4.4.5: from `-O2` onwards, the optimizer emits a single `addl $5, %eax` instruction. I guess that optimization comes for free with the more important ones :) – Frédéric Hamidi Jun 25 '11 at 13:33
2

@burning That's a feature of the high level language. The compiled code doesn't have operators. It's the compiled code that runs. You will find, and you can check, that `i++`, `i+=1` and `i=i+1` are compiled to the same code. – David Heffernan Jun 25 '11 at 13:34
18

@Frédéric Well, what do you know. These days it seems that you have to go to great lengths to write code that runs slowly! – David Heffernan Jun 25 '11 at 13:51
@David: Actually, it's [really really easy.](http://programmers.stackexchange.com/questions/86006/what-causes-bad-performance-in-consumer-apps/86030#86030) Every time any function A calls any function B, it is *hoping* its program counter will be put to good use, and will come back in the foreseeable future. When the stack gets 20-30 levels deep, that can be a vain hope. – Mike Dunlavey Jun 25 '11 at 16:18
I would give off some of my reputation to upvote this more because of the phrase "you should write code for clarity". – bohdan_trotsenko Oct 09 '13 at 16:22

score 33 · Answer 2 · answered Jun 25 '11 at 13:17

33

Options 1 and 2 will result in identical code being produced by the compiler. Option 3 will be much slower.

It's a fallacy that i++ is faster than i += 1 or even i = i + 1. All decent compilers will turn those three instructions into the same code.

For such a trivial operation as addition, write the clearest code and let the compiler worry about making it fast.

answered Jun 25 '11 at 13:17

David Heffernan

601,492
42
1,072
1,490

Won't be option 2 slower than option 1 ?? – Berial Jun 25 '11 at 13:20
2

@Beriel Why would it be? Compilers will emit the same code for 1 and 2. – David Heffernan Jun 25 '11 at 13:22
2

actually, `++i` is faster than `i++`, but only if `i` is an object, and even that only in certain circumstances. – vsz Jun 25 '11 at 19:16
1

@vsz for my own sanity I do not use the pre and post increment operators as rvalues so I don't really care about such things. What's more this question was about int. – David Heffernan Jun 25 '11 at 19:22

score 21 · Answer 3 · edited May 23 '17 at 12:10

The compiler should produce the same assembly for 1 and 2 and it may unroll the loop in option 3. When faced with questions like this, a useful tool you can use to empirically test what's going on is to look at the assembly produced by the compiler. In g++ this can be achieved using the -S switch.

For example, both options 1 and 2 produce this assembler when compiled with the command g++ -S inc.cpp (using g++ 4.5.2)


main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    addl    $5, -4(%rbp)
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

g++ produces significantly less efficient assembler for option 3:


main:
.LFB0:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    movq    %rsp, %rbp
    .cfi_offset 6, -16
    .cfi_def_cfa_register 6
    movl    $0, -8(%rbp)
    jmp .L2
.L3:
    addl    $1, -4(%rbp)
    addl    $1, -8(%rbp)
.L2:
    cmpl    $4, -8(%rbp)
    setle   %al
    testb   %al, %al
    jne .L3
    movl    $0, %eax
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

But with optimisation on (even -O1) g++ produces this for all 3 options:


main:
.LFB0:
    .cfi_startproc
    leal    5(%rdi), %eax
    ret
    .cfi_endproc

g++ not only unrolls the loop in option 3, but it also uses the lea instruction to do the addition in a single instruction instead of faffing about with mov.

So g++ will always produce the same assembly for options 1 and 2. g++ will produce the same assembly for all 3 options only if you explicitly turn optimisation on (which is the behaviour you'd probably expect).

(and it looks like you should be able to inspect the assembly produced by C# too, although I've never tried that)

@David: True. Sorry, I should've been more specific - the link at the bottom of my post provides details of tools which, if I understand correctly, can dissassemble the MSIL produced by C#; in effect allowing you to view the assembly produced from your C# code even though the C# compiler doesn't output assembly on its own (at least, that's my understanding - I don't use C# so I can't test this). — Jack Kelly, Jun 25 '11 at 17:25
@David perhaps not, but this is the exact approach the OP should look to take no matter what language he's dealing with. — corsiKa, Jun 26 '11 at 00:16
@NitinJS The OP did state "This question is also valid for similar languages like C++" and it's tagged both C++ and C#. — Jack Kelly, Jun 28 '11 at 13:55

score 3 · Answer 4 · edited Mar 11 '14 at 17:53

They are same:

static void Main(string[] args)
{
    int a = 0;
    a++;
    a +=1;
    a = a+1;
}

The above code in ILSpy is:

private static void Main(string[] args)
{
    int a = 0;
    a++;
    a++;
    a++;
}

Also the IL for all these is same as well (In Release mode):

.method private hidebysig static void  Main(string[] args) cil managed
{
    .entrypoint
    // Code size       15 (0xf)
    .maxstack  2
    .locals init ([0] int32 a)
    IL_0000:  ldc.i4.0
    IL_0001:  stloc.0
    IL_0002:  ldloc.0
    IL_0003:  ldc.i4.1
    IL_0004:  add
    IL_0005:  stloc.0
    IL_0006:  ldloc.0
    IL_0007:  ldc.i4.1
    IL_0008:  add
    IL_0009:  stloc.0
    IL_000a:  ldloc.0
    IL_000b:  ldc.i4.1
    IL_000c:  add
    IL_000d:  stloc.0
    IL_000e:  ret
} // end of method Program::Main

score 2 · Answer 5 · answered Jun 25 '11 at 13:24

2

Options 1 and 2 will result in identical code after being compiled. Option 3 will be much slower as its results in more code for the for loop involved.

answered Jun 25 '11 at 13:24

Bibhu

4,053
4
33
63

@NitinJS - Yes, the resulted machine code will contain more lines of code. – Bibhu Jun 28 '11 at 10:45

Which is faster? ++, += or x + 1?

5 Answers5

Linked

Related