1

I have pandas data frame with a lot of missing data.If I go for

d = dfs['REV_PIZ'].isna()

Output is boolean.

0        True
1        True
2        True
3        True
4        True
5        True
6        True
7        True

What I really want is to have d only with numerical values, that would enable me to further maths on this column.

MikiBelavista
  • 2,470
  • 10
  • 48
  • 70

3 Answers3

3

It is unclear if there are non numeric values, so 2 possible solutions:


If all values are numeric is possible use boolean indexing with isna:

d = dfs[dfs['REV_PIZ'].notna()]

Or dropna by column REV_PIZ:

d = dfs.dropna(subset=['REV_PIZ'])

Sample:

dfs = pd.DataFrame({'REV_PIZ':[1,2,np.nan]})
d = dfs.dropna(subset=['REV_PIZ'])
print (d)
   REV_PIZ
0      1.0
1      2.0

If mixed numeric with non numeric add to_numeric with errors='coerce' for convert non numeric to NaNs:

dfs = pd.DataFrame({'REV_PIZ':[1,2,np.nan,'a']})
dfs['REV_PIZ'] = pd.to_numeric(dfs['REV_PIZ'], errors='coerce')
d = dfs.dropna(subset=['REV_PIZ'])
print (d)
   REV_PIZ
0      1.0
1      2.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I think your question almost answers itself, you could just filter them out like

d = dfs[~dfs['REV_PIZ'].isna()]

0

This should work-

d=dfs.query("REV_PIZ==REV_PIZ")

see here- Querying for NaN and other names in Pandas

Shir
  • 1,157
  • 13
  • 35