1

Similar question to : Replacing part of string in python pandas dataframe

However it wont work!?

Panas 23.4

Given the following df column:

    Expression
    XYZ&(ABC|DEF)
   (HIJ&FTL&JKK)&(ABC|DEF)
   (FML|AXY|AND)&(ABC|DEF)

I want to strip a substring that may be in each column.

flag = '(ABC|DEF)'
andFlag = '&' + flag #the reasoning for doing this is that 'flag' may change


#Below are all different ways I have tried to achieve this, none have worked. 
df['Expression'] = df['Expression'].replace(andFlag, '', regex=True)
df['Expression'] = df['Expression'].apply(lambda x: re.sub(andFlag, '', x))
df['Expression'] = df['Expression'].replace(to_replace=andFlag, value= '', regex=True)
df['Expression'] = df['Expression'].str.replace(andFlag, '')
df['Expression'] = df['Expression'].str.replace(andFlag, '', regex=True)

I have tried all of these functions with and without regex=True to no avail.

Expected output:

    Expression
    XYZ
   (HIJ&FTL&JKK)
   (FML|AXY|AND)

I'm going slightly crazy trying to figure this out, it seems so simple and straightforward.

MaxB
  • 428
  • 1
  • 8
  • 24

2 Answers2

2

Parenthesis and the vertical bar are special character in regex, so if you want to match these character, you can add a backslash '\' before such as:

flag = '\(ABC\|DEF\)' #see this is changed
andFlag = '&' + flag
print (df['Expression'].replace(andFlag, '', regex=True))

0              XYZ
1    (HIJ&FTL&JKK)
2    (FML|AXY|AND)
Name: Expression, dtype: object
Ben.T
  • 29,160
  • 6
  • 32
  • 54
2

Use str.replace and more important, set regex=False for literal matching:

df['Expression'] = df['Expression'].str.replace(andFlag, '', regex=False)

      Expression
0            XYZ
1  (HIJ&FTL&JKK)
2  (FML|AXY|AND)
Erfan
  • 40,971
  • 8
  • 66
  • 78