Need a clarification while looking at calculating a running checksum.
Assume I have data like this.
data = 'helloworld'
Assuming a blocksize of 5, I need to calculate running checksum.
>>> zlib.adler32('hello')
103547413
>>> zlib.adler32('ellow')
105316900
According to Python documentation (python version 2.7.2)
zlib.adler32(data[, value])
"Computes a Adler-32 checksum of data. (An Adler-32 checksum is almost as reliable as a CRC32 but can be computed much more quickly.) If value is present, it is used as the starting value of the checksum; otherwise, a fixed default value is used. This allows computing a running checksum over the concatenation of several inputs."
But when I provide something like this,
>>> zlib.adler32('ellow', zlib.adler32('hello'))
383190072
The output is entirely different.
I tried creating a custom function to generate the rolling checksum as defined in the rsync algorithm.
def weakchecksum(data):
a = 1
b = 0
for char in data:
a += (ord(char)) % MOD_VALUE
b += a % MOD_VALUE
return (b << 16) | a
def rolling(checksum, removed, added, block_size):
a = checksum
b = (a >> 16) & 0xffff
a &= 0xffff
a = (a - ord(removed) + ord(added)) % MOD_VALUE
b = (b - (block_size * ord(removed)) + a) % MOD_VALUE
return (b << 16) | a
Here is the values that I get from running these functions
Weak for hello: 103547413
Rolling for ellow: 105382436
Weak for ellow: 105316900
As you can see there is some huge difference in my implementation of rolling checksum and python's, in terms of value.
Where am I going wrong in calculating the rolling checksum? Am I making use of the rolling property of python's adler32 function correctly?