1

Im trying to find a "NaN safe" way to convert a column of basically whole numbers. I do not want NaN's to get converted into 0's as there is a difference (in my context) between a zero and a NaN. When i try to convert the original columns using astype('Int64') it goes through no problemo. But when i use an agg function such as mean() the resulting column wont convert to ints even though there are no NaNs in my MWE.

I tried many of the suggestions on this post to no avail: Convert Pandas column containing NaNs to dtype `int`

MWE and Results

#Sample Data
df = pd.DataFrame(data=[[123.0, np.NaN, 456.0], [987.0, 876.0,np.NaN]],
                  columns=['2018', '2019', '2020'],
                  index=['sales','support'])
df['avg'] = df.mean(axis=1)
df

       2018    2019     2020     avg
sales   123.0   NaN     456.0   289.5
support 987.0   876.0   NaN     931.5

#an original column converts no problem and keeps the NaNs
ts = df['2019'].astype('Int64') # --> works fine
ts[0]  #--> <NA>
type(ts[0]) # --> pandas._libs.missing.NAType

#column types appear to be the same
type(df['2018'][0]) # --> np.float64
type(df['avg'][0]) # --> np.float64

#craps out on agg created column
df['avg'].astype(float).astype('Int64') # --> fails with message " cannot safely cast non-equivalent float64 to int64"
df['avg'].astype(float).astype('Int64') # --> fails with message " cannot safely cast non-equivalent float64 to int64"
df['avg'].dtype(pd.Int64Dtype)   # --> fails with message "'numpy.dtype[float64]' object is not callable"
df['avg'].dtype(pd.Int64Dtype()) # --> fails with message "'numpy.dtype[float64]' object is not callable"
pd.to_numeric(df['avg']).astype('Int64', errors='ignore') # --> works but leaves me with a column of floats
round(df['avg'], 0) # --> remains a float even though it rounds off the decimals 932.0
DChaps
  • 482
  • 5
  • 12
  • Pandas has [nullable integer dtype](https://pandas.pydata.org/docs/user_guide/integer_na.html), or int NA equivalent that might work here – G. Anderson Aug 12 '22 at 21:22
  • `df.avg.round().astype('Int64')`, preserves `NaN`s and works around casting rules. I see no way to change *casting* to 'unsafe'. I thought there was an github issue a long time ago but can't find it anymore. – Michael Szczesny Aug 12 '22 at 21:39

2 Answers2

2

Given:

          2018   2019   2020
sales    123.0    NaN  456.0
support  987.0  876.0    NaN

Doing:

df = df.astype('Int64')
# If you just want truncation:
df['avg'] = df.mean(axis=1).astype(int).astype('Int64')
# If you want rounding:
# df['avg'] = df.mean(axis=1).round(0).astype('Int64')
print(df)

Output:

         2018  2019  2020  avg
sales     123  <NA>   456  289
support   987   876  <NA>  931
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • Totally worked. Rounding first worked. But now that i think about it i wonder if there is a difference between using `round(df['avg'])` and `df['avg'].round()`. the first doesnt work. – DChaps Aug 13 '22 at 01:09
  • `round(df.mean(axis=1)).astype('Int64')` and if already created `round(df['avg']).astype('Int64')` both work fine... – BeRT2me Aug 13 '22 at 01:36
0
np.floor(pd.to_numeric(df['avg'], errors='coerce')).astype('Int64')
Michael Hodel
  • 2,845
  • 1
  • 5
  • 10