182

I can use pandas dropna() functionality to remove rows with some or all columns set as NA's. Is there an equivalent function for dropping rows with all columns having value 0?

P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0

In this example, we would like to drop the first 4 rows from the data frame.

thanks!

Dataman
  • 3,457
  • 3
  • 19
  • 31
user308827
  • 21,227
  • 87
  • 254
  • 417
  • Just to clarify, this is two questions. One, to drop columns with *all* values as 0. But also, for a function *equivalent* to dropna() which would drop columns with *any* value as 0. – alchemy Apr 22 '20 at 17:54

14 Answers14

213

One-liner. No transpose needed:

df.loc[~(df==0).all(axis=1)]

And for those who like symmetry, this also works...

df.loc[(df!=0).any(axis=1)]
8one6
  • 13,078
  • 12
  • 62
  • 84
  • 4
    For brevity (and, in my opinion, clarity of purpose) combine this and Akavall's comment: `df.loc[(df != 0).any(1)]`. Teamwork! – Dan Allan Mar 26 '14 at 03:00
  • 1
    +1, 30% faster that transpose -- 491 to 614 microsec, and I like the `axis=1` for being explicit; more pythonic in my opinion – gt6989b Jun 27 '16 at 21:41
  • 2
    Some mention should be made of difference between using .all and .any since the original question mentioned equivalence of dropna. If you want to drop all rows with any column containing a zero, you have to reverse the .all and .any in above answer. Took me awhile to realize this as I was looking for that functionality. – Zak Keirn Mar 06 '18 at 18:21
  • 1
    This does not work for me, but returns me the exact same ```df``` – Robvh Jul 17 '19 at 12:31
  • Is there an 'inplace' version of this? I see that to drop rows in a df as the OP requested, this would need to be `df = df.loc[(df!=0).all(axis=1)]` and `df = df.loc[(df!=0).any(axis=1)]` to drop rows with any zeros as would be the actual equivalent to dropna(). – alchemy Apr 22 '20 at 17:51
146

It turns out this can be nicely expressed in a vectorized fashion:

> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
   a  b
1  0  1
2  1  0
3  1  1
U2EF1
  • 12,907
  • 3
  • 35
  • 37
55

I think this solution is the shortest :

df= df[df['ColName'] != 0]
Ikbel
  • 1,817
  • 1
  • 17
  • 30
34

I look up this question about once a month and always have to dig out the best answer from the comments:

df.loc[(df!=0).any(1)]

Thanks Dan Allan!

The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156
30

Replace the zeros with nan and then drop the rows with all entries as nan. After that replace nan with zeros.

import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)
Tonechas
  • 13,398
  • 16
  • 46
  • 80
stackpopped
  • 309
  • 3
  • 3
12

Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

df[(df.sum(axis=1) != 0)]       # 30% faster 
df[df.values.sum(axis=1) != 0]  # 3X faster 

Continuing with the example from @U2EF1:

In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]: 
   a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop

On a larger dataset:

In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop
clocker
  • 1,376
  • 9
  • 17
  • 4
    Do bad things happen if your row contains a -1 and a 1? – Rhys Ulerich Mar 15 '17 at 20:20
  • Of course, the sum wouldn't work if you had equal rows adding up to 0. Here's a quick workaround for that which is only slightly slower: `df[~(df.values.prod(axis=1) == 0) | ~(df.values.sum(axis=1)==0)]` – clocker Mar 17 '17 at 02:43
  • 1
    The prod() function doesn't solve anything. If you have any 0 in the row that will return 0. If you have to handle a row like this: [-1, -0.5, 0, 0.5, 1], neither of your solutions will work. – Rahul Murmuria Jun 19 '17 at 14:45
  • Here is a correct version that works 3x faster than the accepted answer: `bdf[np.square(bdf.values).sum(axis=1) != 0]` – Rahul Murmuria Jun 19 '17 at 17:59
7

You can use a quick lambda function to check if all the values in a given row are 0. Then you can use the result of applying that lambda as a way to choose only the rows that match or don't match that condition:

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3), 
                  index=['one', 'two', 'three', 'four', 'five'],
                  columns=list('abc'))

df.loc[['one', 'three']] = 0

print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]

Yields:

              a         b         c
one    0.000000  0.000000  0.000000
two    2.240893  1.867558 -0.977278
three  0.000000  0.000000  0.000000
four   0.410599  0.144044  1.454274
five   0.761038  0.121675  0.443863

[5 rows x 3 columns]
             a         b         c
two   2.240893  1.867558 -0.977278
four  0.410599  0.144044  1.454274
five  0.761038  0.121675  0.443863

[3 rows x 3 columns]
8one6
  • 13,078
  • 12
  • 62
  • 84
5
import pandas as pd

df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})

temp = df.abs().sum(axis=1) == 0      
df = df.drop(temp)

Result:

>>> df
   a  b
2  1 -1
Akavall
  • 82,592
  • 51
  • 207
  • 251
4

Following the example in the accepted answer, a more elegant solution:

df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
df = df[df.any(axis=1)]
print(df)

   a  b
1  0  1
2  1  0
3  1  1
Gideon Kogan
  • 662
  • 4
  • 18
3

Another alternative:

# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.

all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape
bmc
  • 817
  • 1
  • 12
  • 23
2

this works for me new_df = df[df.loc[:]!=0].dropna()

pyeR_biz
  • 986
  • 12
  • 36
1

For me this code: df.loc[(df!=0).any(axis=0)] did not work. It returned the exact dataset.

Instead, I used df.loc[:, (df!=0).any(axis=0)] and dropped all the columns with 0 values in the dataset

The function .all() droped all the columns in which are any zero values in my dataset.

Denisa
  • 21
  • 6
0
df = df [~( df [ ['kt'  'b'   'tt'  'mky' 'depth', ] ] == 0).all(axis=1) ]

Try this command its perfectly working.

-2

To drop all columns with values 0 in any row:

new_df = df[df.loc[:]!=0].dropna()
Yapi
  • 164
  • 1
  • 6