2

I have a dataframe which contains NaNs and 0s in some rows for all columns. I am trying to extract such rows, so that I can process them further. Also, some of these columns are object and some float. I am trying the below code to extract such rows, but because of the columns being object, its not giving me the desired result.

Now, I can solve this problem by substituting some arbitrary values to NaN and use it in .isin statement, but then it also changes the datatype of my columns, and I would have to convert them back.

Can somebody please help me with a workaround/solution to this. Thanks.

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[np.nan,0,np.nan,1,'abc'], 'b':[0,np.nan,np.nan,1,np.nan]})

df

     a   b
0   NaN 0.0
1   0   NaN
2   NaN NaN
3   1   1.0
4   abc NaN
5   NaN 1.0

values = [np.nan,0]
df_all_empty = df[df.isin(values).all(1)]
df_all_empty

Expected Output:

     a   b
0   NaN 0.0
1   0   NaN
2   NaN NaN

Actual Output:

     a  b
0   NaN 0.0
Shir
  • 1,157
  • 13
  • 35
SnigA
  • 147
  • 10

3 Answers3

3

Change

df_all_empty = df[(df.isnull()|df.isin([0])).all(1)]
opunsoars
  • 115
  • 7
  • 1
    The `df.isin(values)` matrix for this dataframe is [ 0 True True, 1 True False , 2 True False, 3 False False, 4 False False ] Do you have any why? I couldnt see why the value in 1,b is False, so I assumed something with the comparison to NaNs using `isin` is wrong – Shir Mar 27 '18 at 06:11
  • Looks like `np.NaN != np.NaN` in numpy. So better to use `df[(df.isnull()|df.isin([0])).all(1)]` – opunsoars Mar 27 '18 at 06:23
0

The code below will let you select those rows.

df_sel = df.loc[(df.a.isnull()) | \
                (df.b.isnull()) | \
                (df.a==0)       | \
                (df.b==0)           ]

If you want to make column 'a' in those rows, say for example -9999, you can use:

df.loc[(df.a.isnull()) | \
       (df.b.isnull()) | \
       (df.a==0)       | \
       (df.b==0)           , 'a'] = -9999

For reference, refer to the official documentation, in

https://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

jberrio
  • 972
  • 2
  • 9
  • 20
0

You can use df.query, and the trick described here (compare to NaN by checking if a value equals to itself)

Write something like this:

df.query("(a!=a or a==0) and (b!=b or b==0)")

And the output is:

     a    b
0  NaN  0.0
1    0  NaN
2  NaN  NaN
Shir
  • 1,157
  • 13
  • 35