2

I am trying to use the crc32_combine function from zlib in Python. Although various other zlib functions are available, this one isn't part of the "batteries included" standard library. I've tried two approaches: a port from the C code to Python and calling zlib from Python with ctypes. Both give me different results, although not the result I'm expecting. I'm presenting the ctypes code since I think this executes faster and has a smaller chance for additional programmer errors.

The algorithm can combine two CRC32 hashes when the length of the data of the second hash is provided. crc32_combine is defined as follows:

crc32(crc32(0, seq1, len1), seq2, len2) == crc32_combine(
    crc32(0, seq1, len1), crc32(0, seq2, len2), len2)

This is the output:

Expected CRC: 45E57586
Combined CRC: 567EE4E4

The second line is always different when ran with Python 3.5.1 on win32. Not with Python 2, but the result is never what I expect either. Put the zlib1.dll in the same directory as the script to try it out.

import zlib

def crc32_combine_ctypes(crc1, crc2, len2):
    import ctypes
    from ctypes import util

    lib = util.find_library('zlib1')
    _zlib = ctypes.CDLL(lib)
    assert _zlib._name, "Can't find zlib"

    _zlib.crc32_combine.argtypes = [
        ctypes.c_ulong, ctypes.c_ulong, ctypes.c_ulong]
    _zlib.crc32_combine.restype = ctypes.c_ulong

    return _zlib.crc32_combine(crc1, crc2, len2)

testfile = "zlib1.dll"

with open(testfile, "rb") as tf:
    data = tf.read()

print("Expected CRC: %0.8X" % (zlib.crc32(data) & 0xFFFFFFFF))

cut = len(data) // 2 - 100
p1 = data[0:cut]
p2 = data[cut:]

crc1 = zlib.crc32(p1)
crc2 = zlib.crc32(p2)
len1 = len(p1)
len2 = len(p2)

combined = crc32_combine_ctypes(crc1, crc2, len2)
print("Combined CRC: %0.8X" % (combined & 0xFFFFFFFF))

What am I doing wrong?

Community
  • 1
  • 1
Gfy
  • 8,173
  • 3
  • 26
  • 46
  • 2
    There's something wrong with this 32-bit build of zlib1.dll. On my own 64-bit build (of just this function), the combined result matches the expected result. To build the DLL, I downloaded [the source](http://gnuwin32.sourceforge.net/downlinks/zlib-src-zip.php) from your link, copied out the definitions of `crc32_combine`, `gf2_matrix_times`, and `gf2_matrix_square` from crc32.c, and built it as a 64-bit DLL. – Eryk Sun Feb 08 '16 at 00:24
  • This [32-bit build](http://prdownloads.sourceforge.net/libpng/zlib128-dll.zip?download) also works as expected. – Eryk Sun Feb 08 '16 at 00:49
  • The bad DLL was the problem! Somehow I didn't follow/see the other links on the zlib home page and picked out the bad one :) – Gfy Feb 09 '16 at 21:09

1 Answers1

7

eryksun had the right idea: I used a bad and old DLL. The last zlib version with a 32 bit dll included: https://sourceforge.net/projects/libpng/files/zlib/1.2.8/

My port to pure Python code is a couple hundred times slower than calling the library with ctypes. (numbers from using timeit with 1k iterations and 50m as length parameter)

31.729 (function provided below)
 0.120 (just the _zlib.crc32_combine() call: no library loading included)

The pure Python code:

def crc32_combine(crc1, crc2, len2):
    """Explanation algorithm: https://stackoverflow.com/a/23126768/654160
    crc32(crc32(0, seq1, len1), seq2, len2) == crc32_combine(
        crc32(0, seq1, len1), crc32(0, seq2, len2), len2)"""
    # degenerate case (also disallow negative lengths)
    if len2 <= 0:
        return crc1

    # put operator for one zero bit in odd
    # CRC-32 polynomial, 1, 2, 4, 8, ..., 1073741824
    odd = [0xedb88320] + [1 << i for i in range(0, 31)]
    even = [0] * 32

    def matrix_times(matrix, vector):
        number_sum = 0
        matrix_index = 0
        while vector != 0:
            if vector & 1:
                number_sum ^= matrix[matrix_index]
            vector = vector >> 1 & 0x7FFFFFFF
            matrix_index += 1
        return number_sum

    # put operator for two zero bits in even - gf2_matrix_square(even, odd)
    even[:] = [matrix_times(odd, odd[n]) for n in range(0, 32)]

    # put operator for four zero bits in odd
    odd[:] = [matrix_times(even, even[n]) for n in range(0, 32)]

    # apply len2 zeros to crc1 (first square will put the operator for one
    # zero byte, eight zero bits, in even)
    while len2 != 0:
        # apply zeros operator for this bit of len2
        even[:] = [matrix_times(odd, odd[n]) for n in range(0, 32)]
        if len2 & 1:
            crc1 = matrix_times(even, crc1)
        len2 >>= 1

        # if no more bits set, then done
        if len2 == 0:
            break

        # another iteration of the loop with odd and even swapped
        odd[:] = [matrix_times(even, even[n]) for n in range(0, 32)]
        if len2 & 1:
            crc1 = matrix_times(odd, crc1)
        len2 >>= 1

        # if no more bits set, then done
    # return combined crc
    crc1 ^= crc2
    return crc1
Gfy
  • 8,173
  • 3
  • 26
  • 46