45

I'm trying to calculate/generate the CRC32 hash of some random strings using Python but they do not match the values I generate from online sources. Here is what I'm doing on my PC,

>>> import binascii
>>> binascii.crc32('hello-world')
-1311505829

Another approach,

>>> import zlib
>>> zlib.crc32('hello-world')
-1311505829

The fact that the above results are identical tells me that I'm calling the function correctly. But, if I go to the following online sources,

For the string "hello-world" they all give the same value = b1d4025b

Does anyone know what I need to do, to get matching results?

As I was typing this question it occurred to me that I might need to convert my Python result to hex,

>>> hex(zlib.crc32('hello-world'))
'-0x4e2bfda5'

Unfortunately, that hasn't helped either. :(

chronodekar
  • 2,616
  • 6
  • 31
  • 36

3 Answers3

54

Python 2 (unlike py3) is doing a signed 32-bit CRC.

Those sites are doing an unsigned 32-bit CRC.

The values are the same otherwise, as you can see from this:

>>> 0x100000000 - 0xb1d4025b == 0x4e2bfda5
True

One quick way to convert from 32-bit signed to 32-bit unsigned is:*

>>> -1311505829 % (1<<32)
2983461467

Or, in hex:

>>> hex(-1311505829 % (1<<32))
'0xb1d4025b'

& 0xFFFFFFFF or % 0x100000000 or & (2**32-1) or % (2**32) and so on are all equivalent ways to do the same bit-twiddling; it just comes down to which one you find most readable.


* This only works in languages that do floored integer division, like Python (-3 // 2 == -2); in languages that do truncated integer division, like Java (-3 / 2 == -1), you'll still end up with a negative number. And in languages that don't even require that division and mod go together properly, like C, all bets are off—but in C, you'd just cast the bytes to the type you want…

MarSoft
  • 3,555
  • 1
  • 33
  • 38
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 2
    `Python is doing a signed 32-bit CRC` Just a note: in Python3, this was changed so that it runs an unsigned CRC. See the [docs](https://docs.python.org/3/library/binascii.html#binascii.crc32). – dthor Oct 24 '18 at 17:32
  • 1
    This does not depend on how division is done, but the definition of `%`. In most languages, it is defined as one of: `x % y == x - floor(x / y) * y` ("same sign as divisor", what Python does. So positive, as `2**32` is positive), `x % y == x - truncate(x / y) * y` ("same sign as dividend") or `x % y == x - round_towards_zero(x / y)` ("positive modulo"). Usually this is the same as /, but not always. To always have positive integer division in other languages, you would do something like this: `def positive_mod(a, b): return ((a % b) + b) % b`. See https://en.wikipedia.org/wiki/Modulo_operation – Artyer Dec 19 '18 at 13:03
36

zlib.crc32 documentation suggests using the following approach "to generate the same numeric value across all Python versions and platforms".

import zlib
hex(zlib.crc32(b'hello-world') & 0xffffffff)

The result is 0xb1d4025b as expected.

Aleksei Zyrianov
  • 2,294
  • 1
  • 24
  • 32
  • I'm curious why this would be different across platforms. Wouldn't Python behavior be identical across the board? (ignoring 2.x and 3.x differences) – chronodekar May 07 '15 at 05:13
  • @chronodekar: I'm sure it wouldn't be too hard to find in the source; if you can't find it yourself, you can create a new question. But from a quick test, it's negative on Mac 2.7 and Linux 2.7, positive on Windows 2.7 and Mac 3.5, so I'm pretty sure it's a platform issue, not a 2-vs.-3 issue. Or maybe it's a combination of the two. (Regardless, it doesn't help the OP, whose Python clearly does signed crc32, just like my Mac 2.7 does…) – abarnert May 07 '15 at 05:43
  • 2
    @chronodekar I haven't found any clear answer to that in Python documentation, so I've edited my answer to have the same behavior across all Python versions and platforms. – Aleksei Zyrianov May 07 '15 at 06:01
  • Note that Python 3 guarantees that binascii.crc32 returns an unsigned value, and 2.6 and 2.7 should be guaranteeing a signed value, so platform differences shouldn't be affecting this. – rosuav Feb 05 '19 at 10:12
8

It seems that python is returning an signed integer (hence the negative number), whereas the others are returning an unsigned integer.

I have tried using a modulus with 2^32, and it gave the same value as these sites.

>>> hex(zlib.crc32(b'hello-world')% 2**32)
'0xb1d4025b'
lobeg25
  • 9
  • 3
chw21
  • 7,970
  • 1
  • 16
  • 31