1

I'm trying to drop the rows of a dataframe where the value of one column is 0, but I don't know how

The table I start with looks like this:

a b c d e
99.08 0.0 0.0 0.0 0.0
0.0 95.8 0.0 0.0 0.0
0.0 0.0 97.8 0.0 0.0
0.0 0.0 96.7 0.0 0.0
0.0 0.0 0.0 98.9 0.0

I'm using pandas to deal with some categorical data. I've used pd.melt() to reshape it so the columns that were encoded in a similar fashion to one hot encoding are no problem, that leaves me with a dataframe with only 2 columns, as I want it, but with a lot of 0s in one of the columns.

The table after the melt looks like this:

col1 col2
a 99.8
a 0.0
a 0.0
a 0.0
a 0.0
b 95.8
b 0.0
b 0.0
c 97.8
c 0.0
c 0.0
c 0.0
c 96.5
c 0.0
d 98.9

I want to drop those values because they give no information and take space and resources

I've already tried what was suggested here but it gives an indexing error, because the length of the data returned by the any() function is not the same as the length of the original dataframe, as far as I've been able to understand.

I've also tried the suggestion here but it gives back a ValueError "cannot index with multidimensional key" because my df is 2 dimensional

Dataframe

df = pd.DataFrame({'a': [99.08, 0.0, 0.0, 0.0, 0.0],
                   'b': [0.0, 95.8, 0.0, 0.0, 0.0],
                   'c': [0.0, 0.0, 97.8, 96.7, 0.0],
                   'd': [0.0, 0.0, 0.0, 0.0, 98.9],
                   'e': [0.0, 0.0, 0.0, 0.0, 0.0],
                   })
Laurent B.
  • 1,653
  • 1
  • 7
  • 16
  • `I've already tried what was suggested here but it gives an indexing error, because the length of the data returned by the any() function is not the same as the length of the original dataframe, as far as I've been able to understand.` what is your solution? How looks expected ouput from sample data? – jezrael Apr 18 '23 at 11:32
  • I finally solved it by doing creating a new df like this: df=old_df.loc[old_df[col2!=0]] – David Siret Marqués Apr 19 '23 at 07:23

4 Answers4

3

If need remove 0 values after melting is possible use DataFrame.query:

df1 = df.melt().query('value != 0')

Or select with callable:

df1 = df.melt().loc[lambda x: x['value'] != 0]

Or select by boolean indexing:

df1 = df.melt()
df1 = df1[df1['value'] != 0]

print (df1)
   variable  value
0         a  99.08
6         b  95.80
12        c  97.80
13        c  96.70
19        d  98.90
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This works, although I have another solution that works better((see my answer), the query method has some quirks when naming, it doesn't allow symbols, so it's a bit more annoying. But definitely works! – David Siret Marqués Apr 19 '23 at 08:18
  • @DavidSiretMarqués - Added another solutions to answer ;) – jezrael Apr 19 '23 at 08:59
  • The solution I have is the boolean indexing one, basically – David Siret Marqués Apr 19 '23 at 15:00
  • @DavidSiretMarqués Do you think `df=old_df.loc[old_df[col2!=0]]` from your comment? For me it not working, seem need my last solution `df1 = df.melt() df1 = df1[df1['value'] != 0]`, right? – jezrael Apr 19 '23 at 17:59
  • I might have had a typo when copying it from my code, I did not copypaste it, I retyped it, but yeah, the correct solution is selecting the column with the []. I currently have this line in order to eliminate the 0s after the melt: `df = df.loc[df['value'] != 0]` – David Siret Marqués Apr 20 '23 at 05:25
  • @DavidSiretMarqués - Thank you, you are right. I see my answer was unaccepted, there was some reason? – jezrael Apr 20 '23 at 05:26
  • The coment has the `[ ]` wrongly placed, the closing one whould go before the `!=`operator, right after the`'value` – David Siret Marqués Apr 20 '23 at 05:27
  • 1
    There is no reason for the answer to be unaccepted, I was just trying things out, I'm kinda new to actually asking myself in stackoverflow, Usually the question has already been answered, so no need to ask it myself, I've accepted it again. – David Siret Marqués Apr 20 '23 at 05:28
2

If you want to remove unwanted values while reshaping, stack is a good option as it removes NaNs by default. You just need to mask the 0s before stacking:

out = (df
 .where(df.gt(0)).rename_axis(columns='col1')
 .stack().reset_index(level=1, name='col2')
)

Output:

  col1   col2
0    a  99.08
1    b  95.80
2    c  97.80
3    c  96.70
4    d  98.90
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Proposed

r = df.melt().replace(0, pd.NA).dropna()
r.columns=['col1','col2']

print(r)

Result

   col1   col2
0     a  99.08
6     b  95.80
12    c  97.80
13    c  96.70
19    d  98.90
Laurent B.
  • 1,653
  • 1
  • 7
  • 16
0

Finally solved it by doing:

melted_df = pd.melt(old_df,ignore_index=False) #I want to preserve the indexes for later
final_df = melted_df.loc[melted_df[col2]!=0]

That gets rid of the 0 values after the melt