3

I have a very large pandas data frame containing both string and integer columns. I'd like to search the whole data frame for a specific substring, and if found, replace the full string with something else.

I've found some examples that do this by specifying the column(s) to search, like this:

df = pd.DataFrame([[1,'A'], [2,'(B,D,E)'], [3,'C']],columns=['Question','Answer'])
df.loc[df['Answer'].str.contains(','), 'Answer'] = 'X'

But because my data frame has dozens of string columns in no particular order, I don't want to specify them all. As far as I can tell using df.replace will not work since I'm only searching for a substring. Thanks for your help!

Aero
  • 311
  • 2
  • 5
  • 14

1 Answers1

8

You can use data frame replace method with regex=True, and use .*,.* to match strings that contain a comma (you can replace comma with other any other substring you want to detect):

str_cols = ['Answer']    # specify columns you want to replace
df[str_cols] = df[str_cols].replace('.*,.*', 'X', regex=True)
df
#Question   Answer
#0      1       A
#1      2       X
#2      3       C

or if you want to replace all string columns:

str_cols = df.select_dtypes(['object']).columns
Psidom
  • 209,562
  • 33
  • 339
  • 356