Drop rows with all zeroes in a pandas dataframe

Question

I'm trying to drop the rows of a dataframe where the value of one column is 0, but I don't know how

The table I start with looks like this:

a	b	c	d
99.08	0.0	0.0	0.0
0.0	95.8	0.0	0.0
0.0	0.0	97.8	0.0
0.0	0.0	96.7	0.0
0.0	0.0	0.0	98.9

I'm using pandas to deal with some categorical data. I've used pd.melt() to reshape it so the columns that were encoded in a similar fashion to one hot encoding are no problem, that leaves me with a dataframe with only 2 columns, as I want it, but with a lot of 0s in one of the columns.

The table after the melt looks like this:

col1	col2
a	99.8
a	0.0
a	0.0
a	0.0
a	0.0
b	95.8
b	0.0
b	0.0
c	97.8
c	0.0
c	0.0
c	0.0
c	96.5
c	0.0
d	98.9

I want to drop those values because they give no information and take space and resources

I've already tried what was suggested here but it gives an indexing error, because the length of the data returned by the any() function is not the same as the length of the original dataframe, as far as I've been able to understand.

I've also tried the suggestion here but it gives back a ValueError "cannot index with multidimensional key" because my df is 2 dimensional

Dataframe

df = pd.DataFrame({'a': [99.08, 0.0, 0.0, 0.0, 0.0],
                   'b': [0.0, 95.8, 0.0, 0.0, 0.0],
                   'c': [0.0, 0.0, 97.8, 96.7, 0.0],
                   'd': [0.0, 0.0, 0.0, 0.0, 98.9],
                   'e': [0.0, 0.0, 0.0, 0.0, 0.0],
                   })

`I've already tried what was suggested here but it gives an indexing error, because the length of the data returned by the any() function is not the same as the length of the original dataframe, as far as I've been able to understand.` what is your solution? How looks expected ouput from sample data? — jezrael, Apr 18 '23 at 11:32
I finally solved it by doing creating a new df like this: df=old_df.loc[old_df[col2!=0]] — David Siret Marqués, Apr 19 '23 at 07:23

jezrael · Accepted Answer · 2023-04-19T08:59:01.943

3

If need remove 0 values after melting is possible use DataFrame.query:

df1 = df.melt().query('value != 0')

Or select with callable:

df1 = df.melt().loc[lambda x: x['value'] != 0]

Or select by boolean indexing:

df1 = df.melt()
df1 = df1[df1['value'] != 0]

print (df1)
   variable  value
0         a  99.08
6         b  95.80
12        c  97.80
13        c  96.70
19        d  98.90

edited Apr 19 '23 at 08:59

answered Apr 18 '23 at 11:34

jezrael

822,522
95
1,334
1,252

This works, although I have another solution that works better((see my answer), the query method has some quirks when naming, it doesn't allow symbols, so it's a bit more annoying. But definitely works! – David Siret Marqués Apr 19 '23 at 08:18
@DavidSiretMarqués - Added another solutions to answer ;) – jezrael Apr 19 '23 at 08:59
The solution I have is the boolean indexing one, basically – David Siret Marqués Apr 19 '23 at 15:00
@DavidSiretMarqués Do you think `df=old_df.loc[old_df[col2!=0]]` from your comment? For me it not working, seem need my last solution `df1 = df.melt() df1 = df1[df1['value'] != 0]`, right? – jezrael Apr 19 '23 at 17:59
I might have had a typo when copying it from my code, I did not copypaste it, I retyped it, but yeah, the correct solution is selecting the column with the []. I currently have this line in order to eliminate the 0s after the melt: `df = df.loc[df['value'] != 0]` – David Siret Marqués Apr 20 '23 at 05:25
@DavidSiretMarqués - Thank you, you are right. I see my answer was unaccepted, there was some reason? – jezrael Apr 20 '23 at 05:26
The coment has the `[ ]` wrongly placed, the closing one whould go before the `!=`operator, right after the`'value` – David Siret Marqués Apr 20 '23 at 05:27
1

There is no reason for the answer to be unaccepted, I was just trying things out, I'm kinda new to actually asking myself in stackoverflow, Usually the question has already been answered, so no need to ask it myself, I've accepted it again. – David Siret Marqués Apr 20 '23 at 05:28

score 2 · Answer 2 · answered Apr 18 '23 at 11:47

2

If you want to remove unwanted values while reshaping, stack is a good option as it removes NaNs by default. You just need to mask the 0s before stacking:

out = (df
 .where(df.gt(0)).rename_axis(columns='col1')
 .stack().reset_index(level=1, name='col2')
)

Output:

  col1   col2
0    a  99.08
1    b  95.80
2    c  97.80
3    c  96.70
4    d  98.90

answered Apr 18 '23 at 11:47

mozway

194,879
13
39
75

I tried that and it did not work, returned ValueError (putmask: mask and data must be the same size) – David Siret Marqués Apr 19 '23 at 08:13
You might have non numeric data. Try `df.ne(0)` in place of `df.gt(0)` – mozway Apr 19 '23 at 08:15
I already have a solution with .loc[], but I wanted to try them just to have them in the back of my mind for future reference, I'll try that and keep you informed – David Siret Marqués Apr 19 '23 at 08:44

Laurent B. · Answer 3 · 2023-04-19T05:56:35.933

0

Proposed

r = df.melt().replace(0, pd.NA).dropna()
r.columns=['col1','col2']

print(r)

Result

   col1   col2
0     a  99.08
6     b  95.80
12    c  97.80
13    c  96.70
19    d  98.90

edited Apr 19 '23 at 05:56

answered Apr 19 '23 at 05:49

Laurent B.

1,653
1
7
16

Does not work, returns a RecursionError – David Siret Marqués Apr 19 '23 at 08:20

score 0 · Answer 4 · answered Apr 19 '23 at 07:27

0

Finally solved it by doing:

melted_df = pd.melt(old_df,ignore_index=False) #I want to preserve the indexes for later
final_df = melted_df.loc[melted_df[col2]!=0]

That gets rid of the 0 values after the melt

answered Apr 19 '23 at 07:27

David Siret Marqués

368
13

Drop rows with all zeroes in a pandas dataframe

4 Answers4