0

I have a multilabel classification problem.

I would like to delete rows thave a value (0) in all of the 35 columns of the dataframe, except ['Doc'] column.

Example of dataframe

Doc   Big    Small    Int    Bor   Drama
j2     0       0        0      0     0
i9     1       0        1      1     0
ui8    0       0        0      1     0
po4    0       1        0      0     0
po9    0       0        0      0     0

Here's the expected outcome

Doc   Big    Small    Int    Bor   Drama
i9     1       0        1      1     0
ui8    0       0        0      1     0
po4    0       1        0      0     0

These are the rows I would like to delete:

 j2     0       0        0      0     0
 po9    0       0        0      0     0

Here's how I count them:

rowSums = df.iloc[:,2:].sum(axis=1)
no_labelled = (rowSums==0).sum(axis=0)
print("no.docs with no label =", no_labelled)

no.docs with no label = 60

How can I delete these 60 rows from the dataframe?

Thanks

  • Please read the guidelines, its easier for us to answer if you provide example data and example expected output. – Erfan Feb 28 '19 at 16:14
  • 2
    Possible duplicate of [Drop rows with all zeros in pandas data frame](https://stackoverflow.com/questions/22649693/drop-rows-with-all-zeros-in-pandas-data-frame) – warwick12 Feb 28 '19 at 16:22

2 Answers2

0

You can just extract the required dataframe and assign it to the old variable, rather than explicitly calling del:

df =  df.loc[df.iloc[:, 1:].sum(axis=1) > 0, :]
print(df)
hacker315
  • 1,996
  • 2
  • 13
  • 23
0

You can drop the rows if column sum (other than first) is 0. You can try this

df2.drop(df2[df2.loc[:,1:].sum(axis=1) == 0].index)
No_body
  • 832
  • 6
  • 21
  • Please provide additional information as to what this answer is doing. Code only answers are flagged as low-quality by Stack Overflow. – JamCon Feb 28 '19 at 20:07