Python Pandas: Drop row conditional on column value

Question

I'm scraping property ads with BS4, and use pandas to analyse the data.

In my DataFrame, rows represent property ads and columns represent property characteristics like rent, size, district, etc.

In a few property ads, the district names are incorrectly spelled, or even missing entirely. I would like to drop those property ads, i.e. I would like to drop the rows for which the district name is misspelled or missing.

I have a list containing the correct district names, e.g.

correct_districts=['North', 'South', 'West', 'East']

and I have a DataFrame city_df with a.o. a district column, e.g.

|  District | ....
 -----------------
|   North   | ....
|   South   | ....
|   Nort    | ....
|           | ....
|   West    | ....
|   ....    | ....

Checking this answer on conditional row selection, I did this,

city_df=city_df.loc[~city_df['District'].isin(correct_districts)]

However, this does not change anything in the District column. If I remove ~ and execute the command, I am left with only the rows for which is missing the district name.

What should I change to remove the rows for which the district names are either missing or misspelled?

city_df = city_df.loc[city_df['District'].isin(correct_districts)] works fine. May be you executed the code with ~ which removed all the rows with correct districts in city_df. Try reloading city_df again, it should work — Vaishali, Mar 19 '17 at 19:24
Thank you for the confirmation! I checked my correct_district lists and I overlooked some trailing whitespaces... Hence the strange dropping behaviour. It works now :) — LucSpan, Mar 19 '17 at 19:36

Python Pandas: Drop row conditional on column value

0 Answers0