How do I remove rows from a dataframe?

Question

I'm trying to remove outliers from a dataset. In order to do that, I'm using:

df = df[df.attr < df.attr.mean() + df.attr.std()*3]

That seems to work as expected, but, when I do something like:

for i in xrange(df.shape[0]):
    print df.attr[i]

Then I get a KeyError. Seems like Pandas isn't actually returning a new DataFrame with rows dropped. How do I actually remove those rows, and get a fully functional DataFrame back?

score 2 · Answer 1 · answered Nov 12 '16 at 22:16

2

I think need DataFrame.ix:

for i in xrange(df.shape[0]):
    print df.ix[i, 'attr']

Or Series.iloc:

for i in xrange(df.shape[0]):
    print df.attr.iloc[i]

Simplier solution with Series.iteritems:

for i, val in df.attr.iteritems():
    print (val)

answered Nov 12 '16 at 22:16

jezrael

822,522
95
1,334
1,252

1

I'm tempted to accept your answer since it is actually the best solution on my case, but someone Googling those keywords might actually need to drop the rows (for different reasons) so I'll accept the other one. – MaiaVictor Nov 12 '16 at 23:17
I am a bit surprised, I think [`boolean indexing`](http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing) is better as drop, but it is up to you. good luck :) – jezrael Nov 13 '16 at 08:32

score 2 · Accepted Answer · edited May 23 '17 at 12:30

2

First, find the indices which meet the criteria (which in your case is df.attr < df.attr.mean() + df.attr.std()*3).

x = df.loc[:,attr] < df.attr.mean() + df.attr.std()*3

Next, use DataFrame.drop.

df.drop(x[x].index)

See answers such as How to drop a list of rows from Pandas dataframe? for more information

edited May 23 '17 at 12:30

Community

1
1

answered Nov 12 '16 at 22:23

wwl

2,025
2
30
51

How do I remove rows from a dataframe?

2 Answers2