3

I have a dataframe like this

df_a = cudf.DataFrame()
df_a['key'] = [0, 1, 2, 3, 4]
df_a['values'] = [1,2,np.nan,3,np.nan]

and I would like to replace all 2s with np.nan

usually in pandas dataframe I would use df_a[df_a==2]=np.nan

but in cudf dataframe I get cannot broadcast <class 'int'>

when I use df_a[df_a['values']==2] =np.nan I cannot make sense of the result

using df_a.replace(2, np.NaN)

gives me cannot convert float NaN to integer

The original dataframe is very large so I want to avoid loops and it may contain different datatypes, meaning '2's coul also be floats

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
paka
  • 55
  • 7

1 Answers1

2

I can't find a good reference for this, but using None instead of np.nan seems to do the trick:

from cudf import DataFrame
from numpy import nan

df_a = DataFrame()
df_a['key'] = [0, 1, 2, 3, 4]
df_a['values'] = [1,2, nan,3,nan]
print(df_a)
#    key values
# 0    0      1
# 1    1      2
# 2    2   <NA>
# 3    3      3
# 4    4   <NA>

# mask all 2's (in key and value)
mask = df_a==2
df_a[mask] = None
print(df_a)
#     key values
# 0     0      1
# 1     1   <NA>
# 2  <NA>   <NA>
# 3     3      3
# 4     4   <NA>
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46