1

How does Python hash long numbers? I guess it takes O(1) time for 32-bit ints, but the way long integers work in Python makes me think the complexity is not O(1) for them. I've looked for answers in relevant questions, but have found none straightforward enough to make me confident. Thank you in advance.

Eli Korvigo
  • 10,265
  • 6
  • 47
  • 73

1 Answers1

5

The long_hash() function indeed loops and depends on the size of the integer, yes:

/* The following loop produces a C unsigned long x such that x is
   congruent to the absolute value of v modulo ULONG_MAX.  The
   resulting x is nonzero if and only if v is. */
while (--i >= 0) {
    /* Force a native long #-bits (32 or 64) circular shift */
    x = (x >> (8*SIZEOF_LONG-PyLong_SHIFT)) | (x << PyLong_SHIFT);
    x += v->ob_digit[i];
    /* If the addition above overflowed we compensate by
       incrementing.  This preserves the value modulo
       ULONG_MAX. */
    if (x < v->ob_digit[i])
        x++;
}

where i is the 'object size', e.g. the number of digits used to represent the number, where the size of a digit depends on your platform.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • As far as I know, in Python 3 all numbers are of long type, so is it safe to say that it's always O(i) to hash a number in Python 3? – Eli Korvigo Jan 29 '15 at 14:17
  • 1
    @GrayFall9: yes, and yes. Python 3 `int` is what Python 2 called `long`. Note that the digit size is usually either 15 or 30 bits, so even for large numbers the number of iterations is very small. You'd need numbers over 1152921504606846976 to even go to 3 iterations on most modern systems. – Martijn Pieters Jan 29 '15 at 14:24
  • @EliKorvigo: to be clear the time complexity is `O(log(n))` where `n` is the integer. For large integers that are less than available memory, you could use `gmpy2`, to improve time performance – jfs Aug 13 '15 at 22:03
  • @J.F.Sebastian well, that's strange. I just tried to hash a random 500-digit integer in Python 2 and it finished instantly on my machine. – Eli Korvigo Aug 14 '15 at 14:55
  • @EliKorvigo: That's because `math.log2(500)` is just under 9. You'll need to use **far** bigger numbers to see an effect. What made you think that O(logN) meant that code is going to run slow for N=500? – Martijn Pieters Aug 14 '15 at 14:57
  • @MartijnPieters yeah, I've already got it. I misread the memory part of J.F.Sebastian's comment. Thanks to both of you. – Eli Korvigo Aug 14 '15 at 15:01
  • @EliKorvigo: if `n` is a 500-digit integer then `log(n) ~ 500`. It is a small number so that linear or even a quadratic algorithm ([such as `str(n)`](http://stackoverflow.com/a/28480239/4279)) would execute instantly unless there are large constant factors. `gmp` may segfault if it runs out of memory while ordinary cpython integers would just raise `MemoryError` instead (much better) that is why I've mentioned it. You shouldn't worry about it unless your numbers have millions digits. – jfs Aug 14 '15 at 20:35