I'm trying to drop the rows of a dataframe where the value of one column is 0, but I don't know how
The table I start with looks like this:
a | b | c | d | e |
---|---|---|---|---|
99.08 | 0.0 | 0.0 | 0.0 | 0.0 |
0.0 | 95.8 | 0.0 | 0.0 | 0.0 |
0.0 | 0.0 | 97.8 | 0.0 | 0.0 |
0.0 | 0.0 | 96.7 | 0.0 | 0.0 |
0.0 | 0.0 | 0.0 | 98.9 | 0.0 |
I'm using pandas to deal with some categorical data. I've used pd.melt() to reshape it so the columns that were encoded in a similar fashion to one hot encoding are no problem, that leaves me with a dataframe with only 2 columns, as I want it, but with a lot of 0s in one of the columns.
The table after the melt looks like this:
col1 | col2 |
---|---|
a | 99.8 |
a | 0.0 |
a | 0.0 |
a | 0.0 |
a | 0.0 |
b | 95.8 |
b | 0.0 |
b | 0.0 |
c | 97.8 |
c | 0.0 |
c | 0.0 |
c | 0.0 |
c | 96.5 |
c | 0.0 |
d | 98.9 |
I want to drop those values because they give no information and take space and resources
I've already tried what was suggested here but it gives an indexing error, because the length of the data returned by the any() function is not the same as the length of the original dataframe, as far as I've been able to understand.
I've also tried the suggestion here but it gives back a ValueError "cannot index with multidimensional key" because my df is 2 dimensional
Dataframe
df = pd.DataFrame({'a': [99.08, 0.0, 0.0, 0.0, 0.0],
'b': [0.0, 95.8, 0.0, 0.0, 0.0],
'c': [0.0, 0.0, 97.8, 96.7, 0.0],
'd': [0.0, 0.0, 0.0, 0.0, 98.9],
'e': [0.0, 0.0, 0.0, 0.0, 0.0],
})