18

In pandas when we are trying to cast a series which contains NaN values to integer with a snippet such as below

df.A = df.A.apply(int) , i often see an error message

ValueError: cannot convert float NaN to integer

I understand that NaN values can't be converted to integer. But i am curious about the ValueError thrown in this case. it says float NaN can't be converted to integer.

Is there any specific reason why NaN values are treated as float objects? or is this the case of some issue with the error messages displayed?

jpp
  • 159,742
  • 34
  • 281
  • 339
  • 16
    Because IEEE 754 formats include NAN in their definitions https://en.wikipedia.org/wiki/IEEE_754 (i.e. NaNs *are* [special instances of] floats) – Tedil Feb 01 '18 at 09:13

2 Answers2

11

The short answer is IEEE 754 specifies NaN as a float value.

As for what you should do about converting a pd.Series to specific numeric data types, I prefer to use pd.to_numeric where possible. The below examples demonstrate why.

import pandas as pd
import numpy as np

s = pd.Series([1, 2.5, 3, 4, 5.5])        # s.dtype = float64
s = s.astype(float)                       # s.dtype = float64
s = pd.to_numeric(s, downcast='float')    # s.dtype = float32

t = pd.Series([1, np.nan, 3, 4, 5])       # s.dtype = float64
t = t.astype(int)                         # ValueError
t = pd.to_numeric(t, downcast='integer')  # s.dtype = float64

u = pd.Series([1, 2, 3, 4, 5, 6])         # s.dtype = int64
u = u.astype(int)                         # s.dtype = int32
u = pd.to_numeric(u, downcast='integer')  # s.dtype = int8
jpp
  • 159,742
  • 34
  • 281
  • 339
8

It's worth thinking about what it means to say any number "is" a float. In CPython, the float type is implemented using double in C, which means they use IEEE 754 double precision.

In that standard, there are particular bit sequences which correspond to every floating point number that can be represented in the system (note not all possible numbers between the upper and lower bounds can be represented).

Additionally, there are a couple of special bit sequences which don't correspond to "regular" numbers and therefore cannot be converted to an integer.

  • Two infinities: +∞ and −∞.
  • Two kinds of NaN: a quiet NaN (qNaN) and a signaling NaN (sNaN).

To build a float with such values, you can use this call:

nan = float('nan')
inf = float('inf')

And you can see the same error when passing these values to the int constructor:

>>> int(nan)
ValueError: cannot convert float NaN to integer

>>> int(inf)
OverflowError: cannot convert float infinity to integer
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66