2

I am implementing, in go, a rolling version of the adler32 checksum.

This answer was helpful to double check my maths. However I am struggling at implementing it correctly in golang.

I wrote the following code:

func roll(adler, n, leave, enter uint32) uint32 {
    a := adler & 0xffff
    b := adler >> 16

    a = (a + enter - leave) % MOD
    b = (b - n*leave - 1 + a) % MOD
    return b<<16 | a
}

It tested it on various inputs and it worked fine, until I decided to run it on random data. Here is a sample where it does not work (I found several of them).

What is baffling me is that the same code in python works perfectly on those inputs:

def roll(adler, n, leave, enter):
    a = adler & 0xffff
    b = adler >> 16

    a = (a + enter - leave) % MOD
    b = (b - n*leave - 1 + a) % MOD
    return b<<16 | a

For good measure, I am including proof that this works in python. Note that the python checksum matches the non-rolling version of the go checksum (and that part is directly from the go core libraries).

I studied my results on all the other problematic samples, and found that I am never making a mistake on the least significant bits of the checksum (the "a" bits). Also, the error is consistently the same, equals to 0xe10000. I suspect a peculiarity of how go handles modulo operations on uint32 integers to be the cause of this.

What is happening and how do I fix my code?

Community
  • 1
  • 1
user48678
  • 2,382
  • 3
  • 24
  • 30

2 Answers2

4

The integers in Python are signed. You declared all the integers in the golang version to be unsigned. That's the difference.

When an unsigned number is subtracted from a smaller unsigned number, you get a huge unsigned number that gives a different remainder on division than the small negative difference would. When you wrap, you are effectively adding 232. 232 mod 65521 is 225, or 0xe1, which is why you're seeing that difference in b. It's much more likely to wrap on the b calculation, but it can happen for a as well, if a happens to be very small at that step.

Per the comment by @samgak, you also have to worry about the definition of the % operator in different languages for signed values. So the solution that works across different conventions would be to make the values positive by adding as many MODs as necessary, before doing the % MOD. For a, just add MOD. For b, add (1 + n * leave / MOD) * MOD.

Take care to make sure that the intermediate values don't overflow. The code in go can give erroneous results if n*leave is large enough to wrap the integer type being used.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • You also need to take the different handling of negative modulo dividends between python and go into account. https://play.golang.org/p/R87XWewRPY (ignores the n*leave overflow) – samgak Dec 06 '16 at 04:33
  • "The code in both languages can give erroneous results if n*leave is large enough to wrap the integer type being used." - Python uses arbitrary-precision integers, so overflow isn't a problem in Python. – user2357112 Dec 06 '16 at 05:23
  • @user2357112 Thanks. Fixed that. – Mark Adler Dec 06 '16 at 06:32
  • 2
    Wow an answer from the author of the checksum himself! Thank you very much for your clear explanation. – user48678 Dec 06 '16 at 07:04
  • In order to make sure that `n*leave` does not wrap, I guess I can replace it with `n*leave % MOD == n%MOD*leave`, which is the same as doing `n%=MOD` prior to the whole operation. – user48678 Dec 06 '16 at 08:51
0

I don't know go, but here are some possibilities:

b := adler >> 16 change to b := (adler >> 16) & 0xffff

b = (b - n*leave - 1 + a) % MOD ... what happens if expression in () is negative?

return b<<16 | a ... check operator precedence; (b<<16)|a or b<<(16|a) ?

32-bit machine or 64-bit?
John Machin
  • 81,303
  • 11
  • 141
  • 189