Why Pandas's 'dropping by condition' doesn't work properly?

Asked Jan 22 '20 at 22:55

Active Jan 24 '20 at 20:24

Viewed 105 times

I like all these 'inplace' methods, and tried to use this to filter particular rows from my time-series dataframe:

df.drop(df[(df['location'] == 'City17') | (df['location'] == 'City17 ') | (df['location'] == 'CITY17')].index, inplace=True)

But surprisingly it removed way more data and I was left with only one date left. The date was somewhere from the middle of my DateTime interval; not the first, nor the last. I've found the solution with the assignment statement like this:

df = df[(df['location'] != 'City17') & (df['location'] != 'City17 ') & (df['location'] != 'CITY17')]

Now I now that assignments are shorter and work faster than inplace methods, but I still wonder why the first .drop worked like that.

Update Thanks to hongsy's comment and the best answer from his link, I've solved the issue. The point was in the date column format. It was the object, and after I've transformed it to the DateTime, my .drop method performed correctly. I still have no clue why it evaluates so, but this is another proof that all Date columns should be the DateTime type.

edited Jan 24 '20 at 20:24

asked Jan 22 '20 at 22:55

Jack Catnip

it's very likely that `df` has duplicated index values – Andy L. Jan 22 '20 at 23:27
were you able to verify @AndyL. comment is False? – Kenan Jan 23 '20 at 00:48
please see [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/6619250) – hongsy Jan 23 '20 at 05:50

Why Pandas's 'dropping by condition' doesn't work properly?

0 Answers0