Optimizing by transforming "goto" to branch

Question

I have the following code (This is partially pseudo code for demonstration):

void foo(...){

  //some code here
  do{
     min_item = _MAX_VALUE;
     //some code here also


    if (min_item == _MAX_VALUE)
       break;

    if (smaller_item_x == min_item){
FIRST_IS_SMALLER:
       global_queue[size++] = smaller_item_x;    
       if (next_item_x!=0){
            smaller_item_x= next_item_x;
            if (smaller_item_x > smaller_item_y)
               goto SECOND_IS_SMALLER;
       }
    }else{
SECOND_IS_SMALLER:
       global_queue[size++] = smaller_item_y;    
       if (next_item_y!=0){
            smaller_item_y= next_item_y;
            if (smaller_item_y > smaller_item_x)
               goto FIRST_IS_SMALLER;
       }
    }
  }while(true)

As far as i know goto is translated to jmp in assembler, i am interested to increase performance of this procedure by changing the second goto to something similar to branch (shorter command with short jump up), i may be missing something, and it could be trivial, so my apologies.

I would be less concerned about the performance at this stage (unless profiling has identified this routine as a hotspot), and more concerned about the use of `goto` at all, and the fact that you have duplicated code ([DRY!](http://en.wikipedia.org/wiki/DRY)). My suggestion would be to replace `smaller_item_x` and `smaller_item_y` with `smaller_item[2]`, etc., and do the whole thing with indexed arrays, to avoid the code duplication. — Oliver Charlesworth, Jun 30 '12 at 12:14
There is no duplicate code here, there is a reason why i don't use smaller_item[2] ... , this code has been tested in real time - and the performance will certainly downgrade if you will try to generalize those two variables (i have tested the generalized version + extended generalized version with multiple such variables (queues)). Generally speaking, if you have a small number of such variables , you don't want to generalize since it will probably add many additional deference +.... — Michael, Jun 30 '12 at 12:25
If you've profiled this, then fair enough (the reason I brought it up is because **so** many questions on SO are people wanting to micro-optimise before they've even profiled). But I have to disagree about duplication; those two chunks are identical save for interchanging `x` and `y`. — Oliver Charlesworth, Jun 30 '12 at 12:32
Can you describe the algorithm in simple terms? The block with the gotos seem to be a variant of merge sort where it keeps pulling from x and y until two consecutive elements in the same side are minimal or they are the same. --The reason for the question is that optimizers are better at handling *common* code, if that can be remapped to a loop without gotos, the optimizer might yield better code (it is easier to analyze). Also, at this point you might want to look at what assembly is being generated, whether `goto` is a jump or a short jump up is up to the compiler... — David Rodríguez - dribeas, Jun 30 '12 at 13:19
@ David Rodríguez: Correct, it is sort of merge of sorted arrays , with duplicated values, although it is only small part of the algorithm, this type of code is actually executed in multithreaded environment, and yes i haven't said it before, since i was interested only to optimize this part of the code. — Michael, Jun 30 '12 at 20:16

score 7 · Answer 1 · answered Jun 30 '12 at 12:13

7

It is very difficult to second-guess C compilers these days. They often compile to assembler that is tighter than people would have coded directly. They also don't offer controls to programmers that direct their optimizations to this degree.

If you want this level of control, you will probably have to write in assembler, and chances are good that your code will be slower than the C compiler's.

answered Jun 30 '12 at 12:13

Ned Batchelder

364,293
75
561
662

Of course it won't be slower. He'd just look at the code the compiler made, so it will at worst be just as slow. – harold Jun 30 '12 at 12:21
@harold: that's assuming you can look at assembly code and know what would make it faster. These things get complicated... – Ned Batchelder Jun 30 '12 at 12:24
@Ned Batchelder: I actually do use assembly quite a lot, can you suggest a way to optimize this code in other way? – Michael Jun 30 '12 at 12:28
I have no specific optimizations, sorry. – Ned Batchelder Jun 30 '12 at 12:30
1

I would direct you to Michael Abrash's wonderful guide to optimization at http://nondot.org/~sabre/Mirrored/GraphicsProgrammingBlackBook/. The actual details are a bit dated, but the advice is timeless: the most dramatic optimizations are done at a high level. Agner Fog provides a nice guide to optimizing C/C++ and assembly that's more contemporary, but less timeless http://www.agner.org/optimize/. On SO, there are also some [really nice ideas](http://stackoverflow.com/questions/926266/performance-optimization-strategies-of-last-resort/927773#927773). – sfstewman Jun 30 '12 at 13:25

score 1 · Answer 2 · answered Jun 30 '12 at 14:32

1

This is probably not an answer that you were looking for, but it does not fit in a comment, so I pasted it here.

This piece of code should be equivalent to yours, but it does not have gotos, and it does not introduce additional indirection. There is an additional check and a switch on branchId, but the compiler should be able to optimize it into a single access, and perhaps even put it in a register.

int branchId = smaller_item_x == min_item;
while (branchId >= 0) {
    switch (branchId) {
    case 0:
        global_queue[size++] = smaller_item_y;    
        if (next_item_y != 0) {
            branchId = (smaller_item_y=next_item_y) > smaller_item_x ? 1 : -1;
        }
        break;
    case 1:
        global_queue[size++] = smaller_item_x;    
        if (next_item_x != 0) {
            branchId = (smaller_item_x=next_item_x) > smaller_item_y ? 0 : -1;
        }
        break;
    }
}

answered Jun 30 '12 at 14:32

Sergey Kalinichenko

714,442
84
1,110
1,523

Thanks, i tried all sort of other working ways to achieve the same behavior without goto statements, and i received the best results with the goto, the algorithm is actually much more complicated, but i certainly will try your way and will update soon. – Michael Jun 30 '12 at 20:11
I have tested your suggested switch, as a result i got 20% decrease in performance which is very bad for me. – Michael Jun 30 '12 at 21:27
@Michael could you post a self-contained piece of code, say, to pastebin, so that we could attempt optimizing it hands-on? – Sergey Kalinichenko Jun 30 '12 at 22:43
i will soon publish and article with the new algorithm, then i would able to publish the code, thanks :-) – Michael Jul 01 '12 at 06:15

Optimizing by transforming "goto" to branch

2 Answers2