3

I have a Pandas DataFrame that includes rows that I want to drop based on values in a column "population":

data['population'].value_counts()

general population                          21
developmental delay                         20
sibling                                      2
general population + developmental delay     1
dtype: int64

here, I want to drop the two rows that have sibling as the value. So, I believe the following should do the trick:

data = data.drop(data.population=='sibling', axis=0)

It does drop 2 rows, as you can see in the resulting value counts, but they were not the rows with the specified value.

data.population.value_counts()

developmental delay                         20
general population                          19
sibling                                      2
general population + developmental delay     1
dtype: int64

Any idea what is going on here?

Chris Fonnesbeck
  • 4,143
  • 4
  • 29
  • 30

1 Answers1

7

dataFrame.drop accepts an index (list of labels) as a parameter, not a mask.
To use drop you should do:

data = data.drop(data.index[data.population == 'sibling'])

however it is much simpler to do

data = data[data.population != 'sibling']
joaquin
  • 82,968
  • 29
  • 138
  • 152
  • You can do a similar thing with a condition over multiple columns: `data = data[(data[['col1','col2','col3']] != 0).all(axis=1)]` - to drop all rows with zeros in at least one of those columns. – naught101 Jun 10 '14 at 05:07
  • 1
    caution: method 1 will not re-index the data! – Subspacian Feb 04 '16 at 15:56