When passed a bytes
object, float()
treats the contents of the object as ASCII bytes. That's sufficient here, as the conversion from string to float only accepts ASCII digits and letters, plus .
and _
anyway (the only non-ASCII codepoints that would be permitted are whitespace codepoints), and this is analogous to the way int()
treats bytes
input.
Under the hood, the implementation does this:
- because the input is not a string,
PyNumber_Float()
is called on the object (for str
objects the code jumps straight to PyFloat_FromString
).
PyNumber_Float()
checks for a __float__
method, but if that's not available, it calls PyFloat_FromString()
PyFloat_FromString()
accepts not only str
objects, but any object implementing the buffer protocol. The String
name is a Python 2 holdover, the Python 3 str
type is called Unicode
in the C implementation.
bytes
objects implement the buffer protocol, and the PyBytes_AS_STRING
macro is used to access the internal C buffer holding the bytes.
- A combination of two internal functions named
_Py_string_to_number_with_underscores()
and float_from_string_inner()
is then used to parse ASCII bytes into a floating point value.
For actual str
strings, the CPython implementation actually converts any non-ASCII string into a sequence of ASCII bytes by only looking at ASCII codepoints in the input value, and converting any non-ASCII whitespace character to ascii 0x20 spaces, to then use the same _Py_string_to_number_with_underscores()
/ float_from_string_inner()
combo.
I see this as a bug in the documentation and have filed issue with the Python project to have it updated.