The code uses an unsupported function from the Python C-API to take an arbitrary unsigned char array and turning that into an integer. From the definition of _PyLong_FromByteArray()
you can see why the calling code includes a cast from uint64[]
to char[]
:
PyObject *
_PyLong_FromByteArray(const unsigned char* bytes, size_t n,
int little_endian, int is_signed)
So instead of taking two 64-bit numbers, it is passed 16 8-bit numbers, which is what the (unsigned char *)
cast is for. The call passes in 16
for n
, and little_endian
is set to 1
and is_signed
to 0.
In Python code, you can do the same with the int.to_bytes()
method; convert both to bytes of length 8, little-endian (as the SpookyHash C++ reference implementation is explicitly designed for 64-bit little-endian architectures):
>>> bytevalue = (12579423875165067478).to_bytes(8, 'little') + (12351582206331609335).to_bytes(8, 'little')
>>> bytevalue
b'\xd6\x18H\xa6]\x17\x93\xae\xf7`n>\x93\xa2i\xab'
>>> list(bytevalue)
[214, 24, 72, 166, 93, 23, 147, 174, 247, 96, 110, 62, 147, 162, 105, 171]
Each byte is a component of the final number as a multiple of a power of 256. The least significant byte is multiplied by 256 ** 0
, the next by 256 ** 1
, etc. In a little-endian system, the lowest number comes first (so the 256 to the power 0 value), and in the above, the 171 at the right is the most significant, being 171 times 256 to the power 15.
You can re-create the number in Python code by doing this yourself:
value = 0
for i, b in enumerate(bytevalue):
value += b * (256 ** i)
which produces the expected output:
>>> bytevalue = (12579423875165067478).to_bytes(8, 'little') + (12351582206331609335).to_bytes(8, 'little')
>>> for i, b in enumerate(bytevalue):
... value += b * (256 ** i)
...
>>> value
227846475865583962700201584165695002838
except CPUs use bit-shifting to achieve this; shifting a value 8 bits to the left is the same thing as multiplying it by 256, and repeated applications of such shifts would multiply the value by a power of 256. If you started at the most-significant byte and kept shifting the value-so-far to the left by 8 bits before including the next byte (using bit-wise OR), you'd get the same output:
>>> value = 0
>>> for b in reversed(bytevalue):
... value = value << 8 | b
...
>>> value
227846475865583962700201584165695002838
To avoid reversing, you could shift the current byte by the number of bits already accumulated before combining:
>>> accumbits = 0
>>> for b in bytevalue:
... value |= (b << accumbits)
... accumbits += 8
...
>>> value
227846475865583962700201584165695002838
This is what the _PyLong_FromByteArray
Implementation actually uses. However, the internal structure of a Python int
value actually splits up large integers into multiple 30-bit or 15-bit 'chunks' so arbitrarily large integer values can be fit into fixed-size C integers, which is why the function also uses some additional testing for and shifts with PyLong_SHIFT
.
All this comes down to the two 64-bit input values being placed end-to-end in memory to form a long 128-bit number; the first number (being least significant) to the right of the second number (being more significant), so in Python code you could just shift the second number 64 bits to the left and attach the result to the first:
>>> 12579423875165067478 | 12351582206331609335 << 64
227846475865583962700201584165695002838