Python strip method doesn't work in a dataframe column

Question

I have a dataframe and one of the columns is a city name. To check if I have duplicates values I make a df_hotels['city_name'].value_counts().sort_values. When I display the results I can see that I have duplicates values because of an empty character on the left of some cities.You can check that. (normaly I have a count of 25 for each line)

The problem is that when I try to make a df_hotels['city_name'] = df_hotels['city_name'].str.strip() (or lstrip) it doesn't work, the empty character on the left is still there.

FYI a to give the context : the column type is a object and I have created the dataframe form a json with a simple pd.read_json.

Thanks for you help.

Does this answer your question? [Pandas - Strip white space](https://stackoverflow.com/questions/43332057/pandas-strip-white-space) — Ture Pålsson, Sep 24 '21 at 07:29

score 1 · Answer 1 · answered Sep 24 '21 at 07:53

1

you can use the dropna function to remove duplicate, as explained in the documetation (link).

if you want to apply a function on a column using pandas, you need to use the apply method, and in some cases a lambda function as well. here is an example:

df_hotels['city_name'] = df_hotels['city_name'].apply(lambda x: x.str.strip())

answered Sep 24 '21 at 07:53

Guyblublu

26
3

Hi ! Thanks but I don't want to drop the duplicates, I need to keep them. If I have 10 "Paris" and 15 " Paris", I want 25 "Paris". I tried the apply method but the empty space is still there... – valskyyy Sep 24 '21 at 08:23
I don't know why it didn't work for you, you are welcome to send the data set and I'll try to help. – Guyblublu Sep 24 '21 at 08:42
here is the link to the csv, exported from the dataframe : https://drive.google.com/file/d/1cLDetz00W4JKykPCCWvbJpSbnkVsGLI6/view?usp=sharing – valskyyy Sep 24 '21 at 09:59
this will work for you: import pandas as pd df_hotels = pd.read_csv('data.csv') df_hotels['city_name'] = df_hotels['city_name'].astype('string') # change the type of the column print(df_hotels.loc[df_hotels['city_name'].apply(lambda x: x.startswith(' '))]) # print where city_name starts with ' ' df_hotels['city_name'] = df_hotels['city_name'].apply(lambda x: x.strip()) # remove the ' ' print(df_hotels.loc[df_hotels['city_name'].apply(lambda x: x.startswith(' '))]) # print where city_name starts with ' ' – Guyblublu Sep 24 '21 at 10:16
Works perfectly ! thanks a lot @Guyblublu ! – valskyyy Sep 24 '21 at 20:53

Python strip method doesn't work in a dataframe column

1 Answers1