3

Im experimenting a bit with crc32 in Python and C but my results won't match.

C:
#include <stdio.h>
#include <stdlib.h>
#include <zlib.h>

#define NUM_BYTES 9

int
main(void)
{

  uint8_t bytes[NUM_BYTES] = {1, 2, 3, 4, 5, 6, 7, 8, 9};

  uint32_t crc = crc32(0L, Z_NULL, 0);

  for (int i = 0; i < NUM_BYTES; ++i) {
    crc = crc32(crc, bytes, 1);
  }

  printf("CRC32 value is: %" PRIu32 "\n", crc);
}

Gives the output CRC32 value is: 3136421207

Python

In [1]: import zlib
In [2]: int(zlib.crc32("123456789") + 2**32)
Out[2]: 3421780262

In python I'm adding with 2**32 to "cast" to unsigned int.

What am I missing here?

[edit 1]

Now I have tried with

In [8]: crc = 0;
In [9]: for i in xrange(1,10):
   ...:     crc = zlib.crc32(str(i), crc)
   ...:     
In [10]: crc
Out[10]: -873187034
In [11]: crc+2**32
Out[11]: 3421780262

and

int
main(void)
{

  uint32_t value = 123456789L;

  uint32_t crc = crc32(0L, Z_NULL, 0);

  crc = crc32(crc, &value, 4);

  printf("CRC32 value is: %" PRIu32 "\n", crc);
}

Still not the same result.

evading
  • 3,032
  • 6
  • 37
  • 57

4 Answers4

6

There were problems in your original C and Python code snippets. As for your second C snippet, I haven't tried to compile it, but it's not portable since byte order within an int is platform-dependant. So it will give different results depending on the endianness of the CPU.

One problem, as Serge Ballesta has mentioned, is the difference between {1, 2, 3, 4, 5, 6, 7, 8, 9} and {'1', '2', '3', '4', '5', '6', '7', '8', '9'}. Another problem is that the loop in your original C code didn't actually scan the data, since you didn't use i in the loop, as bav mentioned.

crctest.c

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <zlib.h>

#define NUM_BYTES 9

// gcc -std=c99 -lz -o crctest test.c

void do_crc(uint8_t *bytes)
{
    uint32_t crc = crc32(0L, Z_NULL, 0);

    for (int i = 0; i < NUM_BYTES; ++i)
    {
        crc = crc32(crc, bytes + i, 1);
    }

    printf("CRC32 value is: %lu\n", crc);
}

int main(void)
{
    uint8_t bytes0[NUM_BYTES] = {1, 2, 3, 4, 5, 6, 7, 8, 9};
    uint8_t bytes1[NUM_BYTES] = {'1', '2', '3', '4', '5', '6', '7', '8', '9'};

    do_crc(bytes0);
    do_crc(bytes1);
}

output

CRC32 value is: 1089448862
CRC32 value is: 3421780262

crctest.py

#! /usr/bin/env python

import zlib

def do_crc(s):
    n = zlib.crc32(s)
    return n + (1<<32) if n < 0 else n

s = b'\x01\x02\x03\x04\x05\x06\x07\x08\x09'
print `s`, do_crc(s)

s = b'123456789'
print `s`, do_crc(s)

output

'\x01\x02\x03\x04\x05\x06\x07\x08\t' 1089448862
'123456789' 3421780262

edit

Here's a better way to handle the conversion in Python:

def do_crc(s):
    n = zlib.crc32(s)
    return n & 0xffffffff

See the answers here for more info on this topic: How to convert signed to unsigned integer in python.

Community
  • 1
  • 1
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
  • Excellent answer, thanks. I was almost there by piecing together parts from the other answers but this was golden. – evading Jan 20 '15 at 10:14
  • Thanks! See my update for an alternative way to handle Python's lack of an unsigned int type. – PM 2Ring Jan 20 '15 at 10:26
  • 1
    `gcc -std=c99 -lz -o crctest test.c` gives `undefined references`, should be `gcc -std=c99 -o crctest test.c -lz` – rkta Aug 24 '19 at 10:33
2

According to www.lammertbies.nl that has detailed references on CRC calculation and C routines, the CRC32 of the ASCII string 123456789 in 0xCBF43926, that is 3421780262 as an unsigned 32 integer in decimal form.

That means that your Python computation is correct, but to get same result in C you should write

uint8_t bytes[NUM_BYTES] = {'1', '2', '3', '4', '5', '6', '7', '8', '9'};
uint32_t crc = crc32(0L, Z_NULL, 0);

Alternatively, if what you want is indeed the crc 32 for uint8_t bytes[NUM_BYTES] = {1, 2, 3, 4, 5, 6, 7, 8, 9};, you must use in python 2.x:

s = ''
for i in range(10):
    s += chr(i)
s

outputs : '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t'

then

zlib.crc32(s)

outputs : 1164760902

Nota: in python 3.x, you would have written : s = bytes(range(10))

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
1

The exact copy of your first c-snippet gives the same result:

>>> bytes = [chr(i) for i in range(1, 10)]
>>> crc = zlib.crc32('', 0)
>>> for _ in range(9):
...     crc = zlib.crc32(bytes[0], crc)
>>> crc + 2**32
3136421207

Take note, you do not use i variable in loop.

bav
  • 1,543
  • 13
  • 13
0

It is because CRC32 is calculated at bit level.

You are calculating CRC for each digit individually in C (data size is 9 bytes) and in python, for whole number (which might take just 4 or 8 bytes to represent).

Number of bytes might be different and will lead to different CRC.

Try to calculate the CRC of 123456789 in C.

Edit: Regarding str(i), encoding might be different and moreover, it is ASCII value. As 1 and '1' are not same, you will not get same CRC. Try

crc = zlib.crc32(int(str(i)), crc) # or simply i

In C code, number is in just 4 bytes whereas in python, it is string. A 32-bit integer and array will give different result.

Please note that for the same representation at bit level (with same number of bits), you will get same CRC. Even if one bit is different or extra or less, you will get entirely different CRC.

doptimusprime
  • 9,115
  • 6
  • 52
  • 90