5

I have this block in my program:

if (x > 0) {
    a = 1;
    b = 4;
} else {
    a = 2;
    b = 3;
}

This can be written with ternary operation like this:

a = (x > 0) ? 1 : 2;
b = (x > 0) ? 4 : 3;

The results are equivalents, but it's part of a really critical part of my application, running thousands of times per second. I want to squeeze some microseconds, because this method will grow a little.

My question: in ARM level, which one is faster? I believe the first one creates a branch instruction. But what about the ternary operation? Does it become a branch on iPhone, too? Or iPhone's ARM has a evil opcode to do the job?

BTW, I also saw an evil technique like this:

a = (x > 0) * 1 + (x <= 0) * 2;

Is this really faster?

Eduardo Costa
  • 1,974
  • 1
  • 16
  • 22
  • Define "Better", please. – S.Lott May 27 '11 at 17:27
  • That's from my "Computer architecture 101" teacher: branches are evil. They usually forces a pipeline flush, invalidating any opcode lookahead. This code is part of my "hottest" method on my game. It takes 40% of CPU time, it's called thousands of times PER SECOND and I need to add a lot of features to it. If I can save one microsecond by changing a branch into a multiplication or a ternary operation, I will do it. – Eduardo Costa May 27 '11 at 17:46
  • Does all that mean "Better == Faster"? If so, please **update** the question to avoid the vague "better" and use the more specific "faster". – S.Lott May 27 '11 at 17:49
  • Why does the title still have the vague "better"? Can you explain why? Or fix the title? – S.Lott May 27 '11 at 18:34

7 Answers7

9

EDIT:

Just compiled your example, using GCC/LLVM, different optimizations, and looking at ARM6 and ARM7 assembly, here are my conclusions:

  • ARM-ASM differs depending on GCC / LLVM and target architecture
  • But, when using highest optimization level, it produce exactly the same assembly code for if and ternary, and that, whatever compiler/arch are. (yes, compared several pairs ;)

Here is the most concise result with LLVM / ARM7, using the IT instruction you mentionned, for if and ternary:

MOVS    R1, #2
CMP     R0, #0
IT GT
MOVGT   R1, #1
MOV.W   R2, #3
IT GT
MOVGT   R2, #4

ENDOFEDIT

Just searched a bit on the topic and even if some people thinks ternary is less optimized the most results and more relevant says that it produce the same assembly code.

Take care that it might change with :

  1. compiler GCC, LLVM...
  2. optimization level

I'm a bit lazy right now to disassemble code, but maybe I'll edit that answer later.

So I would think that djna is right, appart the the 2* (x>0), which would be really surprising if not optimized, this is the same.

After that, ternary or not, it is a matter of taste. I prefer ternary when it makes sense in code and is readable.

About second example it is a trick that use the fact that true == 1 / false == 0... Funny, but I wouldn't like to maintain that code.

Community
  • 1
  • 1
Vincent Guerci
  • 14,379
  • 4
  • 50
  • 56
  • Thanks a lot for understanding my question! Yes, I agree that compiler and optimization are huge factors to consider. But I'm too busy to disassemble. That double (x>0) was a mistake while writing the question. The true/false trick is evil, but I could use it. Looooooong time ago, back in old VGA (320x200) days, one darn evil technique to optimize "y*320+x" was "(y<<8)+(y<<6)+x". The 8086 multiplication were really expensive (23 times more than the addition)!! Almost all VGA games I saw the code uses this. – Eduardo Costa May 27 '11 at 17:54
  • Ahh that good old 13h mode, also played with it... :) shifting bits will always be faster than a multiplication, but it can be used in rare situations. – Vincent Guerci May 27 '11 at 18:01
  • I got so happy when I made my first "INT 13h"... :) But today it is very rare to need math optimization. Pentiums take less than 5 cycles in multiplication, against 1 for shifting. But branches are still evil. – Eduardo Costa May 27 '11 at 18:21
  • true, not needed doing that kind of math for a while... too bad that was fun... but not that productive! ;) – Vincent Guerci May 27 '11 at 18:45
  • 3
    +1 for actual, empirical details rather than "I don't *think* it matters." – Dan J May 27 '11 at 19:02
3

No.

There are no winners at code golf.

Readability wins.

So in this case your first example using traditional conditional blocks is about 7099092034902 times easier to read. Anyone even remotely familiar with programming will be able to understand what is going on.

In the latter example... God, what is that!

jathanism
  • 33,067
  • 9
  • 68
  • 86
  • 2
    Off-topic... he asks about compiler optimization. And ternary expressions are fine when not ab-used. see that question > http://stackoverflow.com/questions/160218/to-ternary-or-not-to-ternary – Vincent Guerci May 27 '11 at 17:22
  • As I'm sure you know, the optimization gains on something like this are best left to decide upon after properly profiling. If it turns out that the more readable operation really is coming at a high(er) cost, then only testing is going to determine if those microseconds can truly be squeezed out of the ternary operation. – jathanism May 27 '11 at 17:34
  • 1
    Did you ever made some serious game-programming? Specially on embedded devices (maybe even on a GPU's shader)? If you call a code 1000 times, you can save a whole second by saving only a millisecond. And this code is called way more than only 1000 times. This means I can really save a lot if the XCode's ARM compiler generates a branch! BTW, I DID a profiling and this method is the villain. And it's only a lot of math and ifs. – Eduardo Costa May 27 '11 at 17:35
  • My apologies, but you did not mention that you had already profiled in your question. Have you tried builds of both to see if using the use of ternary operations made a difference? – jathanism May 27 '11 at 18:00
  • I'm sorry, too. My objective was to find how the compiler behaves on that scenario. It's easier than benchmarking trial-and-error. – Eduardo Costa May 28 '11 at 00:36
3

A smart compiler could produce the same output for either code.

A non-smart compiler would result in that (X>0) being evaluated twice in the ternary operator case and therefore be slower!

djna
  • 54,992
  • 14
  • 74
  • 117
  • Thanks for the kind answer. My doubt is really how XCode deals with it (if it's a smart or non-smart compiler). I found an "IT" ARM operator that I'm wondering if its used in this case. – Eduardo Costa May 27 '11 at 17:39
1

Having spent years optimising software, I'd be willing to bet that the small block you post above isn't really your problem. I think we need to see the entire method together with some profiling results (when it comes to optimising, measuring is EVERYTHING) to get a better handle on it.

To answer your own question - take each option and profile it for say 10 million invocations (a mere thousand won't tell you anything much on the fragment above). That will tell you which is best for speed, there is no point in trying to second guess.

Roger
  • 15,793
  • 4
  • 51
  • 73
  • The rest of method was already profiled and tuned - and it's a lot of "if"s like this one. I found that it's my hotspot: I only need to play a little for about 2 minutes and I got the "10 million invocations". The "thousand" is the number of invocations per second. My hope was some ARM-expert could simply tell "XCode+LLVM (or GCC) generates NNN and MMM". – Eduardo Costa May 27 '11 at 18:02
  • So when you say ... a lot of if's like this one. How many are we talking about? 10,100, more? – Roger May 27 '11 at 19:37
  • Currently, I have about 10, plus a loop, inside a recursive logic in my game loop. – Eduardo Costa May 28 '11 at 00:30
  • Ahhhhh ... OK that explains a lot. The recursion will (most likely) be killing you - you should try to refactor the code so it isn't recursive. – Roger May 28 '11 at 09:05
  • Thanks for your attention! The recursion is related to how the data is structured (a slightly complex tree). If I change it now, I'll have some serious headache to maintain. Before implementing it on iOS, I tested the concept using Java without recursion - man, that was ugly! :) This method is critical, but not a bottleneck (yet). I really appreciate your kind attention, but my goal here was to learn (and publish) how ternary works on ARM-level. – Eduardo Costa May 30 '11 at 16:32
0

My question: in ARM level, which one is faster?

Why don't you write a simple test program that measures the execution speed for, say, 100000 invocations of each block?

vocaro
  • 2,779
  • 2
  • 23
  • 16
  • Because I'm only one. I have to do my "real job", take care of my family and develop my games. I'm already investing my short spare time implementing new features and optimizing other algorithms. BTW, Roger answered the same as you 50min before, and I my comment on that was: My hope was some ARM-expert could simply tell "XCode+LLVM (or GCC) generates NNN and MMM". Oh, and Vincent was that expert (answered 20min before you). – Eduardo Costa May 27 '11 at 19:16
  • So you expect someone else to invest their short spare time to solve your problems? Besides, writing a simple benchmark for the code you posted would take less time than posting and commenting on SO. – vocaro May 27 '11 at 19:43
  • Did you notice I asked "which one is faster"? I didn't told "please benchmark that for me". BTW, that's why I didn't post my whole method. I search a lot on Internet, didn't found any ARM-related material - only GNU/x86 and lots of speculation. Now, because of my question and the kind contribution of Vincent, the community has access to that information. What if I made the test myself? I could get an answer, but only to myself. Yes, it took way more time to comment on SO, but I'm doing it to benefit others, too. – Eduardo Costa May 28 '11 at 00:24
  • Please have a look at my blog post for a nice little ARM asm benchmarking tool for iOS: http://www.modejong.com/blog/post10_arm_timing_framework/index.html – MoDJ Aug 08 '13 at 20:42
0

There will be no difference, gcc will compile a simple if and a ternary operator into the same code. If you are really trying to optimize your code at the ARM asm level, then it is possible to get significant speedups by using conditional ARM asm code to avoid branching. But, in order to do that, you need to profile your code and figure out where it is really needed. Then, you would need to hand code ARM asm for only the hot spots in your code. Basically, if you don't need extreme performance improvements, then just use the compiler supplied features as the compiler does a good job in most cases.

MoDJ
  • 4,309
  • 2
  • 30
  • 65
-1

Go for readability, you are not a compiler

jcpennypincher
  • 3,970
  • 5
  • 31
  • 44
  • I don't think you got it. First, I asked what the compiler would do in both cases - I will not make the ARM code myself. Second, I told the code is called thousands of times - one millisecond means SECONDS I can save. This method I was able to save HUGE time only by making a long method instead of small, nice-to-read private methods. – Eduardo Costa May 27 '11 at 17:31
  • See Vincent G's Answer, "it produce exactly the same assembly code for if and ternary, and that, whatever compiler/arch are", in other words, the "better" code would be the code that is easiest to read. When looking at thousands and thousands of lines of code, programmers benefit from the use of white space and proper indentation levels to indicate logic and flow of the program. Many people get hung up on optimization, but an if else versus ternary expression is not where you are going to make significant optimizations. Use delayed variable declaration, good parameter passing / initialization. – jcpennypincher May 29 '11 at 05:24