9

On my Python 2.7.9 on x64 I see the following behavior:

>>> float("10"*(2**28))
inf
>>> float("10"*(2**29))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 10101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010
>>> float("0"*(2**33))
0.0
>>> float("0." + "0"*(2**32))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Unless there's some deeper rationale I'm missing this violates least surprise. When I got the ValueError on "10"*(2**29) I figured it was just a limitation on very long strings, but then "0"*(2**33) worked. What's going on? Can anyone justify why this behavior isn't a POLA bug (if perhaps a relatively irrelevant one)?

FakeName123
  • 137
  • 5
  • 1
    Probably because the parser ignores all leading zeroes before trying to convert the remaining digits to a float value? – casevh Jun 21 '16 at 02:55
  • I can't reproduce this, but that's because my system dumps core trying to create the 512MiB `'10101010...'` string. Have you tried creating the strings as a separate step (`s = '10' * (2 ** 29)` or whatever) and _then_ converting to `float(s)`? The output of `len(s)` might be informative, too. – Kevin J. Chase Jun 21 '16 at 03:22
  • As an aside: [since Python 3.2](https://docs.python.org/3.2/library/functions.html#float), "If the argument is outside the range of a Python float, an [`OverflowError`](https://docs.python.org/3.2/library/exceptions.html#OverflowError) will be raised." Python 2 and earlier versions of Python 3 don't address this, except to say that, "When passing in a string, values for NaN and Infinity may be returned, depending on the underlying C library." – Kevin J. Chase Jun 21 '16 at 03:26

2 Answers2

4

Because the zeros are skipped when inferring the base

I like to look to my favourite reference implementation for questions like this.


The Proof

Casevh has a great intuition in the comments. Here's the relevant code:

for (bits_per_char = -1; n; ++bits_per_char)
    n >>= 1;

/* n <- total # of bits needed, while setting p to end-of-string */
while (_PyLong_DigitValue[Py_CHARMASK(*p)] < base)
    ++p;
*str = p;

/* n <- # of Python digits needed, = ceiling(n/PyLong_SHIFT). */
n = (p - start) * bits_per_char + PyLong_SHIFT - 1;
if (n / bits_per_char < p - start) {
    PyErr_SetString(PyExc_ValueError,"long string too large to convert");
    return NULL;

Where p is initially set to the the pointer to your string. If we look at the PyLongDigitValue table, we see that 0 is explicitly mapped to 0.

Python does a lot of extra work to optimize the conversion of particular bases (there's a fun 200 line comment about converting binary!), that's why it does a lot of work to infer the correct base first. In this case; we can skip over zeros when inferring the base, so they don't count in the overflow calculation.

Indeed, we are checking how many bits are needed to store this float, but python is smart enough to remove leading zeros from this calculation. I don't see anything in the docs of the float function guaranteeing this behaviour across implementations. They ominously state

Convert a string or number to a floating point number, if possible.


When Does this not Work

When you write

   float("0." + "0"*(2**32))

It stops parsing for the base early on - all the rest of the zeros are considered in the bit-length calculation, and contribute to raising the ValueError


Similar Parsing Tricks

Here's a similar case in the float class, where we find that whitespace is ignored (and an interesting comment from the authors on their intent with this design choice)

while (Py_ISSPACE(*s))    
    s++;

/* We don't care about overflow or underflow.  If the platform
 * supports them, infinities and signed zeroes (on underflow) are    
 * fine. */
en_Knight
  • 5,301
  • 2
  • 26
  • 46
  • 1
    Great examples, good explanation. You should fix your example of `int("0x0"+"0"*int(1e1000),16) `, because it gives an `OverflowError` on the `int(1e1000)` part, because 1e1000 is inf. (Even if it didn't, it's more memory than could be in the universe.) – FakeName123 Jun 21 '16 at 03:46
  • You might want to add [`language:` comments](https://stackoverflow.com/editing-help#syntax-highlighting) before the two C code blocks, because Stack Overflow is trying to syntax-highlight them as if they were Python. `` (flush left --- not indented), then a blank line, then the code block. – Kevin J. Chase Jun 21 '16 at 03:48
  • @user3047059 I just removed it - I don't know why I used that example anyways, it's a different set of function calls altogether when the base is specified manually. I can add it back in with an explanation if you'd like, but otherwise I hope it's good as is – en_Knight Jun 21 '16 at 03:54
2

For the case of float("10"*(2**29)), you are converting the string to a float value which most probably exceeds the max value that a float can have in Python.

Whereas, for the case of float("0"*(2**33)), you are converting the string to a float value of 0.0 regardless of how many times you multiply it by.

The error did not occur because of the limitation on very long strings but due to the limitation on the maximum value of float.

Feel free to check this out What is the maximum float in Python?

Community
  • 1
  • 1
Munosphere
  • 174
  • 1
  • 12
  • 1
    That's what I thought, too (see my comments), but Python 3.2+ raises a different exception, and earlier versions can quietly return a bogus `float` value, depending on the string they're given. – Kevin J. Chase Jun 21 '16 at 03:34
  • 1
    What about the bottom case? It too obeys the maximum float rules of python, but raises an error nonetheless.. – en_Knight Jun 21 '16 at 03:42
  • @en_Knight Hmmm missed out on that case. Thanks for the explanation for the bottom case! – Munosphere Jun 21 '16 at 04:22