strip() on multiple column not working - pandas

Question

I am writing a generic dataframe cleansing function as follows

def cleanse_data(df,cols_to_strip):
    df.replace({'(?=.*)(\s*\[.*\]\s*)':'','\*':'','\+':'',',.*':'','—':''},inplace=True, regex=True)
    df.columns.str.strip()
    df[cols_to_strip] = df[cols_to_strip].applymap(lambda x: x.strip())
    return df

the second argument takes the list of columns in the dataframe to stip() (i.e. remove its whitespaces) .... calling this function

nhl_df = cleanse_data(nhl_df,['team'])
print(nhl_df[nhl_df['team']=='Jose Sharks']) #doesnt work
print(nhl_df[nhl_df['team'].str.strip()=='Jose Sharks']) #works

so it seems that for some reason the stripping inside the cleansing function didnt work (though the regex replacement worked fine !!) ... any reason for this ??

It should working, if change `df[cols_to_strip] = df[cols_to_strip].applymap(lambda x: x.strip())` to `df[cols_to_strip] = df[cols_to_strip].apply(lambda x: x.str.strip())` still `print(nhl_df[nhl_df['team']=='Jose Sharks'])` not working? — jezrael, Jul 12 '23 at 05:48
If check `EDIT` in my answer, what return `print(nhl_df.loc[nhl_df['team'].str.contains('Jose Sharks'), 'team'].tolist()) ` ? — jezrael, Jul 12 '23 at 07:32

score 0 · Answer 1 · answered Jul 12 '23 at 05:54

One idea is use DataFrame.apply with Series.str.strip, but your solution should working well:

df[cols_to_strip] = df[cols_to_strip].apply(lambda x: x.str.strip())

EDIT: Is possible test ouput of your function?

nhl_df = cleanse_data(nhl_df,['team'])
print(nhl_df.loc[nhl_df['team'].str.contains('Jose Sharks'), 'team'].tolist())

strip() on multiple column not working - pandas

1 Answers1