4

I have a pandas dataframe that I'm trying to drop rows based on a criteria across select columns. If the values in these select columns are zero, the rows should be dropped. Here is an example.

import pandas as pd
t = pd.DataFrame({'a':[1,0,0,2],'b':[1,2,0,0],'c':[1,2,3,4]})

  a b c
0 1 1 1 
1 0 2 2 
2 0 0 3 
3 2 0 4

I would like to try something like:

cols_of_interest = ['a','b'] #Drop rows if zero in all these columns
t = t[t[cols_of_interest]!=0]

This doesn't drop the rows, so I tried:

t = t.drop(t[t[cols_of_interest]==0].index)

And all rows are dropped.

What I would like to end up with is:

  a b c
0 1 1 1 
1 0 2 2 
3 2 0 4

Where the 3rd row (index 2) was dropped because it took on value 0 in BOTH the columns of interest, not just one.

Machavity
  • 30,841
  • 27
  • 92
  • 100
nfmcclure
  • 3,011
  • 3
  • 24
  • 40

1 Answers1

3

Your problem here is that you first assigned the result of your boolean condition: t = t[t[cols_of_interest]!=0] which overwrites your original df and sets where the condition is not met with NaN values.

What you want to do is generate the boolean mask, then drop the NaN rows and pass thresh=1 so that there must be at least a single non-NaN value in that row, we can then use loc and use the index of this to get the desired df:

In [124]:

cols_of_interest = ['a','b']
t.loc[t[t[cols_of_interest]!=0].dropna(thresh=1).index]
Out[124]:
   a  b  c
0  1  1  1
1  0  2  2
3  2  0  4

EDIT

As pointed out by @DSM you can achieve this simply by using any and passing axis=1 to test the condition and use this to index into your df:

In [125]:

t[(t[cols_of_interest] != 0).any(axis=1)]
Out[125]:
   a  b  c
0  1  1  1
1  0  2  2
3  2  0  4
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 3
    Why wouldn't you simply use `t[(t[cols_of_interest] != 0).any(axis=1)]` or something? – DSM Mar 25 '15 at 16:51
  • @DSM good point, I was literally fixing up the OP's attempt, will update – EdChum Mar 25 '15 at 16:51
  • Thanks! I didn't know about the Boolean mask thing- I'll have to read about that further. The second edit/answer also makes sense. – nfmcclure Mar 25 '15 at 16:58
  • It's worth checking out the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html – EdChum Mar 25 '15 at 16:59