0

I am writing a generic dataframe cleansing function as follows

def cleanse_data(df,cols_to_strip):
    df.replace({'(?=.*)(\s*\[.*\]\s*)':'','\*':'','\+':'',',.*':'','—':''},inplace=True, regex=True)
    df.columns.str.strip()
    df[cols_to_strip] = df[cols_to_strip].applymap(lambda x: x.strip())
    return df

the second argument takes the list of columns in the dataframe to stip() (i.e. remove its whitespaces) .... calling this function

nhl_df = cleanse_data(nhl_df,['team'])
print(nhl_df[nhl_df['team']=='Jose Sharks']) #doesnt work
print(nhl_df[nhl_df['team'].str.strip()=='Jose Sharks']) #works

so it seems that for some reason the stripping inside the cleansing function didnt work (though the regex replacement worked fine !!) ... any reason for this ??

osama yaccoub
  • 1,884
  • 2
  • 17
  • 47
  • It should working, if change `df[cols_to_strip] = df[cols_to_strip].applymap(lambda x: x.strip())` to `df[cols_to_strip] = df[cols_to_strip].apply(lambda x: x.str.strip())` still `print(nhl_df[nhl_df['team']=='Jose Sharks'])` not working? – jezrael Jul 12 '23 at 05:48
  • still not working yes @jezrael – osama yaccoub Jul 12 '23 at 07:26
  • If check `EDIT` in my answer, what return `print(nhl_df.loc[nhl_df['team'].str.contains('Jose Sharks'), 'team'].tolist()) ` ? – jezrael Jul 12 '23 at 07:32

1 Answers1

0

One idea is use DataFrame.apply with Series.str.strip, but your solution should working well:

df[cols_to_strip] = df[cols_to_strip].apply(lambda x: x.str.strip())

EDIT: Is possible test ouput of your function?

nhl_df = cleanse_data(nhl_df,['team'])
print(nhl_df.loc[nhl_df['team'].str.contains('Jose Sharks'), 'team'].tolist()) 
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252