I'm scraping property ads with BS4, and use pandas to analyse the data.
In my DataFrame, rows represent property ads and columns represent property characteristics like rent, size, district, etc.
In a few property ads, the district names are incorrectly spelled, or even missing entirely. I would like to drop those property ads, i.e. I would like to drop the rows for which the district name is misspelled or missing.
I have a list containing the correct district names, e.g.
correct_districts=['North', 'South', 'West', 'East']
and I have a DataFrame city_df
with a.o. a district column, e.g.
| District | ....
-----------------
| North | ....
| South | ....
| Nort | ....
| | ....
| West | ....
| .... | ....
Checking this answer on conditional row selection, I did this,
city_df=city_df.loc[~city_df['District'].isin(correct_districts)]
However, this does not change anything in the District
column. If I remove ~
and execute the command, I am left with only the rows for which is missing the district name.
What should I change to remove the rows for which the district names are either missing or misspelled?