3

Out of curiosity, I was seeing what the operations to transform an object's id into its hash look like in the string domain, rather than using the usual bitwise operations like ^, |, &, ~.

class A:
    pass

def my_hash(a):
    bits = format(id(a), '064b')
    rot4 = bits[-4:] + bits[:-4]
    n = int(rot4, 2)
    return n

for _ in xrange(10):
    a = A()
    print hash(a) == my_hash(a), hash(a), my_hash(a)

But as you can see below, the function below isn't correct some of the time. What am I missing?

>>> run /tmp/thing.py
True 272331835 272331835
False -9223372036582443978 9223372037127107638
True 272331835 272331835
False -9223372036582443978 9223372037127107638
True 272331835 272331835
False -9223372036582443978 9223372037127107638
True 272331835 272331835
False -9223372036582443978 9223372037127107638
True 272331835 272331835
False -9223372036582443978 9223372037127107638
wim
  • 338,267
  • 99
  • 616
  • 750
  • And the numbers flip-flop because each iteration Python gets to reuse the just-freed memory slot; `A()` is first assigned to memory position *foo* and bound to `a`, then the next iteration a new `A()` is assigned memory position *bar*, rebinding that to `a`, at which point the older instance at *foo* is freed (reference count dropped to 0 since `a` no longer references it). So the next iteration `A()` is stored in location *foo*, and *bar* is freed, etc. – Martijn Pieters Nov 08 '15 at 21:41

1 Answers1

5

The hash produces a signed integer, your code produces an unsigned integer.

For your first incorrect result, the id(a) value was 4357309288; that's 0000000000000000000000000000000100000011101101110100001101101000 in 64 bits. The last 4 bits are 1000, moving those to the start gives the binary value of 1000000000000000000000000000000000010000001110110111010000110110, which is --9223372036582443978 when interpreted as a 2's complement signed integer, because that first bit, the sign bit, is set to 1.

int(rot4, 2) on the other hand, always interprets the input as an unsigned, unlimited length integer, so you get 9223372037127107638 instead.

Python doesn't have any 'easy' options to interpret a string containing a binary number to a signed integer, you could use the bitstring library for ease:

>>> from bitstring import Bits
>>> bits = Bits(int=4357309288, length=64)
>>> bits[-4:]
Bits('0x8')
>>> bits[-4:] + bits[:-4]
Bits('0x80000000103b7436')
>>> (bits[-4:] + bits[:-4]).int
-9223372036582443978L
>>> (bits[-4:] + bits[:-4]).uint
9223372037127107638L

The .int and .uint give you a signed and unsigned integer interpretation, respectively.

Using bitstring I get the correct output:

>>> def my_hash(a):
...     bits = Bits(int=id(a), length=64)
...     return (bits[-4:] + bits[:-4]).int
...
>>> for _ in xrange(10):
...     a = A()
...     print hash(a) == my_hash(a), hash(a), my_hash(a)
...
True -9223372036585854145 -9223372036585854145
True 268921659 268921659
True -9223372036585854145 -9223372036585854145
True 268921659 268921659
True -9223372036585854145 -9223372036585854145
True 268921659 268921659
True -9223372036585854145 -9223372036585854145
True 268921659 268921659
True -9223372036585854145 -9223372036585854145
True 268921659 268921659

If you want to stick to the standard library, use this Stack Overflow answer to grab yourself a twos_comp() function:

>>> twos_comp(9223372037127107638, 64)
-9223372036582443978L

Your function then would be:

def my_hash(a):
    bits = format(id(a), '064b')
    rot4 = bits[-4:] + bits[:-4]
    n = twos_comp(int(rot4, 2), 64)
    return n
Community
  • 1
  • 1
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343