How does Python convert bytes into float?

Question

I have the following code snippet:

#!/usr/bin/env python3

print(float(b'5'))

Which prints 5.0 with no error (on Linux with utf-8 encoding). I'm very surprised that it doesn't give an error since Python is not supposed to know what encoding is used for the bytes object.

Any insight?

Have you rad the [documentation](https://docs.python.org/3/howto/unicode.html#encodings)? and https://docs.python.org/3.6/c-api/buffer.html#bufferobjects — Mazdak, May 18 '18 at 10:07
@Kasramvd: the documentation for `float()` states it accepts a `str`, a number, or a type that implements `__float__`. `bytes` doesn't implement `__float__`. — Martijn Pieters, May 18 '18 at 10:13
@MartijnPieters [Here](https://docs.python.org/3/library/functions.html#float) it's mentioned that If the argument is a string, it should contain a decimal number, optionally preceded by a sign, and optionally embedded in whitespace. doesn't `b'5'` follow that rule? Although it should have been specified clearly in the documentation. — Mazdak, May 18 '18 at 10:17
Fair question, since [not all encodings are supersets of ASCII](https://stackoverflow.com/q/6531750/4014959). — PM 2Ring, May 18 '18 at 10:17
@Kasramvd: no, it doesn't. The `bytes` type is not considered a string. — Martijn Pieters, May 18 '18 at 10:24
@MartijnPieters Indeed, I mean since bytes represent a sequence of characters and they can also contain decimals, it should have been mentioned as well which as you mentioned it's a bug in documentation. — Mazdak, May 18 '18 at 10:26

Martijn Pieters · Accepted Answer · 2018-05-18T10:48:35.920

When passed a bytes object, float() treats the contents of the object as ASCII bytes. That's sufficient here, as the conversion from string to float only accepts ASCII digits and letters, plus . and _ anyway (the only non-ASCII codepoints that would be permitted are whitespace codepoints), and this is analogous to the way int() treats bytes input.

Under the hood, the implementation does this:

because the input is not a string, PyNumber_Float() is called on the object (for str objects the code jumps straight to PyFloat_FromString).
PyNumber_Float() checks for a __float__ method, but if that's not available, it calls PyFloat_FromString()
PyFloat_FromString() accepts not only str objects, but any object implementing the buffer protocol. The String name is a Python 2 holdover, the Python 3 str type is called Unicode in the C implementation.
bytes objects implement the buffer protocol, and the PyBytes_AS_STRING macro is used to access the internal C buffer holding the bytes.
A combination of two internal functions named _Py_string_to_number_with_underscores() and float_from_string_inner() is then used to parse ASCII bytes into a floating point value.

For actual str strings, the CPython implementation actually converts any non-ASCII string into a sequence of ASCII bytes by only looking at ASCII codepoints in the input value, and converting any non-ASCII whitespace character to ascii 0x20 spaces, to then use the same _Py_string_to_number_with_underscores() / float_from_string_inner() combo.

I see this as a bug in the documentation and have filed issue with the Python project to have it updated.

I know there won't be a thing about python that this guy doesn't know. — Sraw, May 18 '18 at 10:26
Thanks for the great answer. So, just to be clear, this will fail with certain encodings, such as UTF-16? — static_rtti, May 18 '18 at 11:37
@static_rtti: absolutely, because the `\x00` bytes won't be accepted. The bytes **must** be ASCII only, and fit the `float()` string interpretation rules. — Martijn Pieters, May 18 '18 at 11:39

How does Python convert bytes into float?

1 Answers1