5

So, cPython (2.4) has some interesting behaviour when the length of something gets near to 1<<32 (the size of an int).

r = xrange(1<<30)
assert len(r) == 1<<30

is fine, but:

r = xrange(1<<32)
assert len(r) == 1<<32
ValueError: xrange object size cannot be reported`__len__() should return 0 <= outcome

Alex's wowrange has this behaviour as well. wowrange(1<<32).l is fine, but len(wowrange(1<<32)) is bad. I'm guessing there is some floating point behaviour (being read as negative) action going on here.

  1. What exactly is happening here? (this is pretty well-solved below!)
  2. How can I get around it? Longs?

(My specific application is random.sample(xrange(1<<32),ABUNCH)) if people want to tackle that question directly!)

Community
  • 1
  • 1
Gregg Lind
  • 20,690
  • 15
  • 67
  • 81
  • @Gregg, funny enough I get OverflowError instead of ValueError (as does the accepted A to that Q), but, de minimis. Problem is, for your specific application, you want a `random.sample` that can't fit in memory -- but each Python structure **must** fit in memory. If you open another Q and specify the parameters in more detail maybe it's more practical to offer help wrt the specific application... – Alex Martelli Jan 25 '10 at 01:35
  • @Alex, actually, the sample doesn't have to fit in memory, but in 2.4 (I know, old news!) random module, it does a len() call that fails! xrange(1<<32) itself is fine, but the call: n = len(population) on 299 of the module isn't. – Gregg Lind Jan 25 '10 at 03:00
  • `random.sample` needs `to call len()` in Python 2.5, 2.6, 3.0 and 3.1 as well, and that call fails on `xrange(1<<32)` in every single version (since `len()` only applies to containers that "fit in memory" and that `xrange` conceptually does not). So if you specify better what exactly you need, esp. what's a typical value for `ABUNCH`, we can suggest how to work around this limitation of `random.sample` (which applies to _all_ Python versions around!-). Better done in a different Q, IMHO. – Alex Martelli Jan 25 '10 at 03:12
  • @Alex, I'm just going to yank the guts out of the random.sample method, and write my own, since it's pretty trivial. – Gregg Lind Jan 25 '10 at 03:31
  • sure, though if `ABUNCH` was very large there would be some precautions worth taking (performance-wise). – Alex Martelli Jan 25 '10 at 05:25

3 Answers3

12

cPython assumes that lists fit in memory. This extends to objects that behave like lists, such as xrange. essentially, the len function expects the __len__ method to return something that is convertable to size_t, which won't happen if the number of logical elements is too large, even if those elements don't actually exist in memory.

SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
  • thank you for explaining why `len` in particular is behaving this way. cPython len expects `size_t`. – Gregg Lind Jan 24 '10 at 22:04
  • Minor quibble: just because a length is too big for a `size_t` doesn't mean that the object doesn't fit in memory. For example I have a class representing a bit field for which `__len__` stops working for objects over 256MB in 32-bit Python. – Scott Griffiths Jul 15 '10 at 06:57
5

You'll find that

xrange(1 << 31 - 1)

is the last one that behaves as you want. This is because the maximum signed (32-bit) integer is 2^31 - 1.

1 << 32 is not a positive signed 32-bit integer (Python's int datatype), so that's why you're getting that error.

In Python 2.6, I can't even do xrange(1 << 32) or xrange(1 << 31) without getting an error, much less len on the result.

Edit If you want a little more detail...

1 << 31 represents the number 0x80000000 which in 2's complement representation is the lowest representable negative number (-1 * 2^31) for a 32-bit int. So yes, due to the bit-wise representation of the numbers you're working with, it's actually becoming negative.

For a 32-bit 2's complement number, 0x7FFFFFFF is the highest representable integer (2^31 - 1) before you "overflow" into negative numbers.

Further reading, if you're interested.

Note that when you see something like 2147483648L in the prompt, the "L" at the end signifies that it's now being represented as a "long integer" (64 bits, usually, I can't make any promises on how Python handles it because I haven't read up on it).

Sapph
  • 6,118
  • 1
  • 29
  • 32
1

1<<32, when treated as a signed integer, is negative.

Anon.
  • 58,739
  • 8
  • 81
  • 86