4

I'm working on my master's thesis (computer science) on code which is written for post-quantum-secure signatures. The whole thing can be found here but is not important here. For my thesis I tried to explain a 'simple' function, which is not so simple at all.

The function tests, if a variable is non-zero in the GF(16). (GF(16) here can be understood as 4-bit unsigned integers). This function looks as follows:

static inline uint8_t gf16_is_nonzero(uint8_t a) {
    unsigned a4 = a & 0xf; // mask lowest 4 bits of a
    unsigned r = 0u - a4;  // set 4 high bits if a is nonzero
    r >>= 4;               // right-shift high bits into low bits
    return r & 1;          // return lowest bit
}

I understood how it works but I don't understand why this function needs to be this complex. Could there be a good reason for that? Good reasons could be performance or secureness (e.g. safety against timing attacks) benefits. Because if there are no such benefits, wouldn't it be smarter to write that function in an easy manner like:

static inline uint8_t gf16_is_nonzero(uint8_t a) {
    return (a & 15) != 0;
}

EDITS

This code is not written by me, it is written by crypto-researches, who are trying to get their PQ-algorithm standardized by NIST.

An easier approach for the second code snippet was suggested by TonyDelroy in the comments.

Jan Schultke
  • 17,446
  • 6
  • 47
  • 96
linderd
  • 43
  • 4

1 Answers1

8

The reason for this code is because it is branchless.

Testing for a condition tends to be an expensive operation, whereas addition, subtraction, and bitwise operators are not.

This however is premature optimization. With -O3, the first function compiles to this:

andl    $15, %edi
negl    %edi
shrl    $31, %edi
movl    %edi, %eax
ret

While the second function compiles to this:

andl    $15, %edi
setne   %al
ret

The moral of the story: write code that clearly states your intentions and let the compiler figure out the rest.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • 3
    As an additional note: the compiler was probably going to generate branchless code anyway, and this code might be slower / longer. – Dietrich Epp Aug 25 '20 at 21:02
  • 4
    Since it's crypto code, it's probably not allowed to even *maybe* have a branch. "Just rely on the compiler" would be a funny joke in crypto. – harold Aug 25 '20 at 21:16
  • I'm confused - there's no reason to have thought either implementation in the question would involve branches - so how does this relate to the question? – Tony Delroy Aug 25 '20 at 21:17
  • setne isn't a branch though. Both are branchless. – Joseph Ireland Aug 25 '20 at 21:18
  • 1
    @TonyDelroy the naive way to compile `a4 != 0` is with compare and branch. That we happen to get a `setcc` is the result of a lucky compiler optimization. – harold Aug 25 '20 at 21:20
  • @TonyDelroy The unoptimized assembly uses the `cmpl` instruction. It's the comparison that tends to be expensive, and comparisons typically come with a jump. – dbush Aug 25 '20 at 21:20
  • 1
    Yes, I think @harold has a point. It's not so much that a branch would be expensive, as that it means the execution time can depend on the data, and that opens the door to side channel attacks. – Nate Eldredge Aug 25 '20 at 21:24
  • 3
    @dbush: comparisons aren't expensive - see https://www.agner.org/optimize/instruction_tables.pdf - they set flags as side effects which can then use something like the `setne` without any jump, though it all depends on the CPU instruction set. Still, there's no justification for doing anything other than `x & 15 == 0`, and I can't imagine the convoluted code in the question having been written to avoid branching. The bizarre code seems FUD-driven to me. Maybe written by a hardware engineer or something who overthought this. – Tony Delroy Aug 25 '20 at 21:24
  • 1
    @TonyDelroy: Well, `x & 15 == 0` is always 0 :-) – Dietrich Epp Aug 25 '20 at 21:39
  • A comparison is *not* a branch, it's a basic operation just like addition or &, but even simpler, it definitely wouldn't be considered expensive or risky for crypto. You would have to run if(x) on the result to get a branch, but you could do that for either implementation – Joseph Ireland Aug 25 '20 at 21:40
  • 2
    Well it's easy to say that a comparison is not a branch and it's all FUD, but a comparison [may result in a branch](https://godbolt.org/z/Kvodnz). It may also not, as tends to happen after optimization. There is no guarantee either way. – harold Aug 25 '20 at 21:49
  • @harold the minimal optimization level of the makefile in the reference implementation is O2 for standard compilation and O1 for debugging, so in the usecase of this implementation there should be no branch – linderd Aug 25 '20 at 21:59
  • 1
    @harold there's nothing that guarantees you that the compiler won't recognize the pattern and compile it down to `setne`. clang is smart enough to recognize a 10+ line function which reverses bytes as equivalent to just a `bswap` instruction, so what guarantees you it won't match this pattern? Inline assembly is the only viable solution here. – Jan Schultke Aug 26 '20 at 06:33
  • @J.Schultke turning the constant-time trickery into `setne` (or `cmovcc`) would be fine, that's still constant time just different, it's inventing a branch that we didn't write which would be bad.. hypothetically that's also allowed I suppose, that would be a fun world-scale disaster – harold Aug 26 '20 at 06:45