compare chars for equality without branching

Question

On a previous question where I wanted to optimize this function:

static
lvh_distance levenshtein_distance( const std::string & s1, const std::string & s2 )
{
    const size_t len1 = s1.size(), len2 = s2.size();
    std::vector<unsigned int> col( len2+1 ), prevCol( len2+1 );

    const size_t prevColSize = prevCol.size();
    for( unsigned int i = 0; i < prevColSize; i++ )
        prevCol[i] = i;

    for( unsigned int i = 0, j; i < len1; ++i )
    {
        col[0] = i+1;
        const char s1i = s1[i];
        for( j = 0; j < len2; ++j )
        {
            const auto minPrev = 1 + std::min( col[j], prevCol[1 + j] );
            col[j+1] = std::min( minPrev, prevCol[j] + (  s1i == s2[j] ? 0 : 1   ) );

        }
        col.swap( prevCol );
    }
    return prevCol[len2];
}

An user commented that I could replace s1i == s2[j] ? 0 : 1 with ((s1i - s2[j]) & 0x80) >> 7 to prevent a conditional jump. The trick was wrong and the user deleted his comment, but I am wondering whether there could actually be a way to do that.

Did that user who made the comment, look at the optimized output of the compiler to see whether there really was a conditional jump in it with your code? — Steve Jessop, Apr 22 '13 at 11:40
@SteveJessop Added link to other question: yes, the assembly output was shown there. — TemplateRex, Apr 22 '13 at 11:47
Small local optimizations is what the *compiler* does best. I would be more worried about all the extra copies of sizes and array elements. Does having *more* variables really make the code faster? — Bo Persson, Apr 22 '13 at 11:50
@rhalbersma: thanks for the link. Unfortunately you can't do that with the vectors. `reserve` doesn't change the size of the vector, only the capacity, so the init loop would be accessing out of bounds. You could initialize the vector using a `boost::counting_iterator` or equivalent, though, to avoid having two passes over the vector data. — Steve Jessop, Apr 22 '13 at 12:16
@SteveJessop Why not let the first `for` loop a `reserve` followed by `prevCol.push_back(i)`? Similarly for `col` which also could use `push_back` inside the loop and a `reserve` before. — TemplateRex, Apr 22 '13 at 12:24
@rhalbersma: personally I've sometimes seen overhead when repeatedly calling `push_back` (updating the end pointer repeatedly), but it's certainly worth a try since there's no reason the compiler can't just do that once at the end. — Steve Jessop, Apr 22 '13 at 12:54

Agentlien · Accepted Answer · 2013-04-22T11:52:40.330

3

Assuming that the code

s1i == s2[j] ? 0 : 1

does give you a branching operation, which you really want to avoid, you could simply attempt the following:

!(s1i == s2[j])

This should give the same effect, and may help the compiler remove the branching. Or, you could reverse the logic and write

s1i != s2[j]

As always with this type of optimizations, there is never a guarantee that this will actually achieve the results you hope for. Optimizers do very many clever things and trying to predict how they will react to your tricks is very often difficult. So, even in the best case, all you can do is attempt different solutions and compare the resulting binary code.

edited Apr 22 '13 at 11:52

answered Apr 22 '13 at 11:37

Agentlien

4,996
1
16
27

`unsigned int`, not `char`. The principle is sound but there’s no reason to assume that the compiler would generate different code here than for what OP is doing … – Konrad Rudolph Apr 22 '13 at 11:41
@KonradRudolph Thanks. I misread the example. Of course, there is no guarantee, but I've seen it have the desired effect before. With these kinds of questions there are never any guarantees. I'll add that to my answer. – Agentlien Apr 22 '13 at 11:43

fatihk · Answer 2 · 2013-04-22T11:46:41.100

2

Why not use the following: !(s1i == s2[j]) or (s1i != s2[j]) because bool to int conversion is implicit

edited Apr 22 '13 at 11:46

answered Apr 22 '13 at 11:41

fatihk

7,789
1
26
48

Thanks for the tip. I used it to improve my answer, so +1 for you. – Agentlien Apr 22 '13 at 11:53

score 1 · Answer 3 · answered Apr 22 '13 at 11:54

1

Not a practical answer, but rather to solve a puzzle.
Create an array one_or_zero[UCHAR_MAX+1] fill it with 1s, and one_or_zero[0] = 0;
Now you can do something like prevCol[j] + one_or_zero[s1i^s2[j]])
this will lead to 0 on s1i==s2[j] and 1 otherwise being added to prevCol[j]

answered Apr 22 '13 at 11:54

alexrider

4,449
1
17
27

I will have to try it. I doubt it would be faster, as it implies dereferencing, possibly having a cache miss, and computing a XOR, but that's an interesting approach. – qdii Apr 22 '13 at 12:04

compare chars for equality without branching

3 Answers3