Drop rows with all zeros in pandas data frame

Question

I can use pandas dropna() functionality to remove rows with some or all columns set as NA's. Is there an equivalent function for dropping rows with all columns having value 0?

P   kt  b   tt  mky depth
1   0   0   0   0   0
2   0   0   0   0   0
3   0   0   0   0   0
4   0   0   0   0   0
5   1.1 3   4.5 2.3 9.0

In this example, we would like to drop the first 4 rows from the data frame.

thanks!

Just to clarify, this is two questions. One, to drop columns with *all* values as 0. But also, for a function *equivalent* to dropna() which would drop columns with *any* value as 0. — alchemy, Apr 22 '20 at 17:54

8one6 · Answer 1 · 2014-03-26T03:04:11.630

213

One-liner. No transpose needed:

df.loc[~(df==0).all(axis=1)]

And for those who like symmetry, this also works...

df.loc[(df!=0).any(axis=1)]

edited Mar 26 '14 at 03:04

answered Mar 26 '14 at 02:07

8one6

13,078
12
62
84

4

For brevity (and, in my opinion, clarity of purpose) combine this and Akavall's comment: `df.loc[(df != 0).any(1)]`. Teamwork! – Dan Allan Mar 26 '14 at 03:00
1

+1, 30% faster that transpose -- 491 to 614 microsec, and I like the `axis=1` for being explicit; more pythonic in my opinion – gt6989b Jun 27 '16 at 21:41
2

Some mention should be made of difference between using .all and .any since the original question mentioned equivalence of dropna. If you want to drop all rows with any column containing a zero, you have to reverse the .all and .any in above answer. Took me awhile to realize this as I was looking for that functionality. – Zak Keirn Mar 06 '18 at 18:21
1

This does not work for me, but returns me the exact same ```df``` – Robvh Jul 17 '19 at 12:31
Is there an 'inplace' version of this? I see that to drop rows in a df as the OP requested, this would need to be `df = df.loc[(df!=0).all(axis=1)]` and `df = df.loc[(df!=0).any(axis=1)]` to drop rows with any zeros as would be the actual equivalent to dropna(). – alchemy Apr 22 '20 at 17:51

U2EF1 · Accepted Answer · 2014-03-26T03:03:54.573

146

It turns out this can be nicely expressed in a vectorized fashion:

> df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
> df = df[(df.T != 0).any()]
> df
   a  b
1  0  1
2  1  0
3  1  1

edited Mar 26 '14 at 03:03

answered Mar 26 '14 at 01:59

U2EF1

12,907
3
35
37

7

Nice, but I think you can avoid negation with `df = df[(df.T != 0).any()]` – Akavall Mar 26 '14 at 02:23
1

@Akavall Much better! – U2EF1 Mar 26 '14 at 03:04
3

Just a note: OP wanted to drop `rows with all columns having value 0`, but one can infer `all` method. – paulochf Apr 25 '16 at 20:02
@paulochf It's "some or all columns set as NA's". – U2EF1 Apr 25 '16 at 23:46
@U2EF1: it's not, read again: "Is there an equivalent function for dropping rows with all columns having value 0?" – paulochf Apr 26 '16 at 18:39
@paulochf Huh. Cool. – U2EF1 Apr 28 '16 at 05:33
what is df.T? seems like an odd name – Joseph Garvin Oct 26 '17 at 21:49
@JosephGarvin It's the transpose of `df` and is meant to match numpy's notation for the same. – U2EF1 Oct 27 '17 at 01:08
1

All of these answers explain how can we drop rows with all zeros, However, I wanted to drop rows, with 0 in the first column. With the help of all discussion and answers in this post, I did this by doing df.loc[df.iloc[:, 0] != 0]. Just wanted to share because this problem is related to this question!! – hemanta Feb 14 '19 at 04:47
3

The transpose is not necessary, any() can take an axis as a parameter. So this works: df = df[df.any(axis=1)] – Rahul Jha Jul 17 '19 at 17:22
Can you explain ```.any()``` – abhishah901 Oct 30 '19 at 21:44

score 55 · Answer 3 · answered Mar 08 '19 at 15:59

55

I think this solution is the shortest :

df= df[df['ColName'] != 0]

answered Mar 08 '19 at 15:59

Ikbel

1,817
1
17
30

2

And its inplace too! – Max Kleiner Aug 10 '20 at 19:42
2

@MaxKleiner inplace by virtue of reassigning the variable – lukas Sep 07 '20 at 09:29
4

This solution deletes rows with AT LEAST 1 zero. The original poster asked to delete rows with ALL zeros. This is why The Unfun Cat's answer is correct. – Iterator516 Apr 30 '21 at 13:23

score 34 · Answer 4 · answered Mar 09 '16 at 13:05

34

I look up this question about once a month and always have to dig out the best answer from the comments:

df.loc[(df!=0).any(1)]

Thanks Dan Allan!

answered Mar 09 '16 at 13:05

The Unfun Cat

29,987
31
114
156

2

No digging required. @8one6 has included this in his answer back in 2014 itself, the part that says: "And for those who like symmetry...". – Rahul Murmuria Jun 19 '17 at 14:30
What if you have mixed data types, some strings and a lot of number columns with zeros? – Arthur D. Howland Feb 24 '23 at 03:23

score 30 · Answer 5 · edited Jun 25 '19 at 23:07

30

Replace the zeros with nan and then drop the rows with all entries as nan. After that replace nan with zeros.

import numpy as np
df = df.replace(0, np.nan)
df = df.dropna(how='all', axis=0)
df = df.replace(np.nan, 0)

edited Jun 25 '19 at 23:07

Tonechas

13,398
16
46
80

answered Jul 03 '15 at 11:43

stackpopped

309
3
3

10

This will fail if you have any pre-existing NaN-s in the data. – OmerB Sep 04 '17 at 13:45

score 12 · Answer 6 · answered Feb 05 '17 at 17:58

12

Couple of solutions I found to be helpful while looking this up, especially for larger data sets:

df[(df.sum(axis=1) != 0)]       # 30% faster 
df[df.values.sum(axis=1) != 0]  # 3X faster

Continuing with the example from @U2EF1:

In [88]: df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})

In [91]: %timeit df[(df.T != 0).any()]
1000 loops, best of 3: 686 µs per loop

In [92]: df[(df.sum(axis=1) != 0)]
Out[92]: 
   a  b
1  0  1
2  1  0
3  1  1

In [95]: %timeit df[(df.sum(axis=1) != 0)]
1000 loops, best of 3: 495 µs per loop

In [96]: %timeit df[df.values.sum(axis=1) != 0]
1000 loops, best of 3: 217 µs per loop

On a larger dataset:

In [119]: bdf = pd.DataFrame(np.random.randint(0,2,size=(10000,4)))

In [120]: %timeit bdf[(bdf.T != 0).any()]
1000 loops, best of 3: 1.63 ms per loop

In [121]: %timeit bdf[(bdf.sum(axis=1) != 0)]
1000 loops, best of 3: 1.09 ms per loop

In [122]: %timeit bdf[bdf.values.sum(axis=1) != 0]
1000 loops, best of 3: 517 µs per loop

answered Feb 05 '17 at 17:58

clocker

1,376
9
17

4

Do bad things happen if your row contains a -1 and a 1? – Rhys Ulerich Mar 15 '17 at 20:20
Of course, the sum wouldn't work if you had equal rows adding up to 0. Here's a quick workaround for that which is only slightly slower: `df[~(df.values.prod(axis=1) == 0) | ~(df.values.sum(axis=1)==0)]` – clocker Mar 17 '17 at 02:43
1

The prod() function doesn't solve anything. If you have any 0 in the row that will return 0. If you have to handle a row like this: [-1, -0.5, 0, 0.5, 1], neither of your solutions will work. – Rahul Murmuria Jun 19 '17 at 14:45
Here is a correct version that works 3x faster than the accepted answer: `bdf[np.square(bdf.values).sum(axis=1) != 0]` – Rahul Murmuria Jun 19 '17 at 17:59

score 7 · Answer 7 · answered Mar 26 '14 at 01:53

You can use a quick lambda function to check if all the values in a given row are 0. Then you can use the result of applying that lambda as a way to choose only the rows that match or don't match that condition:

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame(np.random.randn(5,3), 
                  index=['one', 'two', 'three', 'four', 'five'],
                  columns=list('abc'))

df.loc[['one', 'three']] = 0

print df
print df.loc[~df.apply(lambda row: (row==0).all(), axis=1)]

Yields:

              a         b         c
one    0.000000  0.000000  0.000000
two    2.240893  1.867558 -0.977278
three  0.000000  0.000000  0.000000
four   0.410599  0.144044  1.454274
five   0.761038  0.121675  0.443863

[5 rows x 3 columns]
             a         b         c
two   2.240893  1.867558 -0.977278
four  0.410599  0.144044  1.454274
five  0.761038  0.121675  0.443863

[3 rows x 3 columns]

score 5 · Answer 8 · answered Mar 26 '14 at 02:06

5

import pandas as pd

df = pd.DataFrame({'a' : [0,0,1], 'b' : [0,0,-1]})

temp = df.abs().sum(axis=1) == 0      
df = df.drop(temp)

Result:

>>> df
   a  b
2  1 -1

answered Mar 26 '14 at 02:06

Akavall

82,592
51
207
251

Did not work for me with a 1-column dataframe. Got `ValueError: labels [True ... ] not contained in matrix` – The Unfun Cat Apr 24 '15 at 12:25
1

instead of `df = df.drop(temp)` use `df = df.drop(df[temp].index)` – Douglas Ferreira Jun 25 '19 at 23:25

Gideon Kogan · Answer 9 · 2021-06-15T14:38:05.910

4

Following the example in the accepted answer, a more elegant solution:

df = pd.DataFrame({'a':[0,0,1,1], 'b':[0,1,0,1]})
df = df[df.any(axis=1)]
print(df)

   a  b
1  0  1
2  1  0
3  1  1

edited Jun 15 '21 at 14:38

answered Jun 15 '21 at 14:13

Gideon Kogan

662
4
18

score 3 · Answer 10 · answered Apr 29 '18 at 20:08

Another alternative:

# Is there anything in this row non-zero?
# df != 0 --> which entries are non-zero? T/F
# (df != 0).any(axis=1) --> are there 'any' entries non-zero row-wise? T/F of rows that return true to this statement.
# df.loc[all_zero_mask,:] --> mask your rows to only show the rows which contained a non-zero entry.
# df.shape to confirm a subset.

all_zero_mask=(df != 0).any(axis=1) # Is there anything in this row non-zero?
df.loc[all_zero_mask,:].shape

score 2 · Answer 11 · edited Oct 06 '20 at 05:24

2

this works for me new_df = df[df.loc[:]!=0].dropna()

edited Oct 06 '20 at 05:24

pyeR_biz

986
12
36

answered Oct 05 '20 at 20:57

majdoul jihane

21
2

score 1 · Answer 12 · answered Feb 27 '20 at 13:00

For me this code: df.loc[(df!=0).any(axis=0)] did not work. It returned the exact dataset.

Instead, I used df.loc[:, (df!=0).any(axis=0)] and dropped all the columns with 0 values in the dataset

The function .all() droped all the columns in which are any zero values in my dataset.

score 0 · Answer 13 · answered Nov 19 '18 at 08:55

0

df = df [~( df [ ['kt'  'b'   'tt'  'mky' 'depth', ] ] == 0).all(axis=1) ]

Try this command its perfectly working.

answered Nov 19 '18 at 08:55

Kumar Prasanna

46
7

Yapi · Answer 14 · 2019-03-19T10:45:34.610

-2

To drop all columns with values 0 in any row:

new_df = df[df.loc[:]!=0].dropna()

edited Mar 19 '19 at 10:45

answered Mar 19 '19 at 10:39

Yapi

164
1
6

Drop rows with all zeros in pandas data frame

14 Answers14

Linked