Optimization of a diagonal iterator, ideally with predication

Question

I am interested in optimizing my diagonal iterator. For background on the problem see StackOverflow question Traverse Matrix in Diagonal strips from 2009. In the code below, I use a different approach than the solutions found in that topic. The idea is to create sort of a gray code-like iterator that steps through the diagonals using only a single boolean control variable (whether columns are increasing, zColumnIncrease). This methodology creates a state machine with 7 possible states:

        // z - boolean type indicator
        // ct - count (number of rows or columns)
        // x - index (the one-based index of the row or column)

        zColumnIncrease = true;
        ....
        if( xRow == ctRows && xColumn == ctColumns ) break;
        if( zColumnIncrease ){
            if( xColumn < ctColumns ){
                xColumn++;
                if( xRow > 1 ){
                    xRow--;
                } else {
                    zColumnIncrease = false;
                }
            } else {
                zColumnIncrease = false;
                xRow++;
            }
        } else {
            if( xColumn > 1 ){
                if( xRow < ctRows ){
                    xColumn--;
                    xRow++;
                } else {
                    zColumnIncrease = true;
                    xColumn++;
                }
            } else {
                zColumnIncrease = true;
                if( xRow < ctRows ){
                    xRow++;
                } else {
                    xColumn++;
                }
            }
        }

Note that this state machine (which is normally at the end of some work loop) iterates the diagonals in boustrophedon order. For example, for a 5x5 column-ordered matrix the iteration of the cells is as follows:

11
21 12
13 22 31
41 32 23 14
15 24 33 42 51
52 43 34 25
35 44 53
54 45
55

What I would like to do is optimize this state machine. I see two main types of optimization: logical expression compression and predication. In the first case it may be possible to combine the logic in some way so that the code is more efficient. A much bigger win would be to replace the if-statements with predicated instructions. This is ideally what I would like to do, so that a resulting assembly language translation would be unbranched. Therefore, I am interested in optimizations to this code and specifically would like to know how I can predicate the code so that it becomes unbranched.

Since this appears to be asking about optimizations to working code, it would probably be better suited on [codereview.se]. — Bernhard Barker, Feb 20 '14 at 23:11
I don't want a code review. The kind of optimization needed here is here is a complex algorithmic problem, not some refactoring. — Tyler Durden, Feb 21 '14 at 03:08

score 0 · Answer 1 · 2014-02-20T21:28:03.950

Let us denote with +/./- (for +1/0/-1) the increments to the column and row indexes that lead from one position to the next.

For the 5x5 array, we have

+.
-+ .+
+- +- +.
-+ -+ -+ .+
+- +- +- +- .+
-+ -+ -+ +.
+- +- .+
-+ +.

We can also arrange this like the array, hoping to see a column/row pattern appear:

+. -+ +. -+ .+
.+ +- -+ +- -+
+- -+ +- -+ .+
.+ +- -+ +- -+
+- +. +- +.

or separately for c and r:

+ - + - .
. + - + -
+ - + - .
. + - + -
+ + + +

. + . + +
+ - + - +
- + - + +
+ - + - +
- . - .

We can summarize by saying that the increments follow a regular matrix of alternating + and -, except at certain places along the four edges. The main pattern is simply obtained by changing the sign on every iteration. The corrections to be given to the basic pattern goe as follow (2 for +2):

. . . . -
+ . . . .
. . . . -
+ . . . .
. 2 . 2

+ . + . 2
. . . . .
. . . . 2
. . . . .
. - . -

They can be obtained using expressions such as (r & 1) & !c (first column of the top array) or (~r & 1) & !(c-4) (last column top array). With some care, all increments can be computed branchless anywhere. No very short, but branchless.

Optimization of a diagonal iterator, ideally with predication

1 Answers1