Removing Partial Strings from Dataframe Column

Question

Similar question to : Replacing part of string in python pandas dataframe

However it wont work!?

Panas 23.4

Given the following df column:

    Expression
    XYZ&(ABC|DEF)
   (HIJ&FTL&JKK)&(ABC|DEF)
   (FML|AXY|AND)&(ABC|DEF)

I want to strip a substring that may be in each column.

flag = '(ABC|DEF)'
andFlag = '&' + flag #the reasoning for doing this is that 'flag' may change


#Below are all different ways I have tried to achieve this, none have worked. 
df['Expression'] = df['Expression'].replace(andFlag, '', regex=True)
df['Expression'] = df['Expression'].apply(lambda x: re.sub(andFlag, '', x))
df['Expression'] = df['Expression'].replace(to_replace=andFlag, value= '', regex=True)
df['Expression'] = df['Expression'].str.replace(andFlag, '')
df['Expression'] = df['Expression'].str.replace(andFlag, '', regex=True)

I have tried all of these functions with and without regex=True to no avail.

Expected output:

    Expression
    XYZ
   (HIJ&FTL&JKK)
   (FML|AXY|AND)

I'm going slightly crazy trying to figure this out, it seems so simple and straightforward.

score 2 · Answer 1 · answered Jul 16 '19 at 17:12

2

Parenthesis and the vertical bar are special character in regex, so if you want to match these character, you can add a backslash '\' before such as:

flag = '\(ABC\|DEF\)' #see this is changed
andFlag = '&' + flag
print (df['Expression'].replace(andFlag, '', regex=True))

0              XYZ
1    (HIJ&FTL&JKK)
2    (FML|AXY|AND)
Name: Expression, dtype: object

answered Jul 16 '19 at 17:12

Ben.T

29,160
6
32
54

Regardless of that, shouldn't it have worked when they tried `df['Expression'] = df['Expression'].str.replace(andFlag, '')`? – Zachary Oldham Jul 16 '19 at 17:14
1

@ZacharyOldham no, but you are right, by setting the paramater regex to False (by default it is True) then you get the right answer `df['Expression'].str.replace(andFlag, '', regex=False)` – Ben.T Jul 16 '19 at 17:16
1

Ah, so is `regex=True` the default? – Zachary Oldham Jul 16 '19 at 17:17
@ZacharyOldham apparently :) – Ben.T Jul 16 '19 at 17:17

score 2 · Answer 2 · answered Jul 16 '19 at 17:16

2

Use str.replace and more important, set regex=False for literal matching:

df['Expression'] = df['Expression'].str.replace(andFlag, '', regex=False)

      Expression
0            XYZ
1  (HIJ&FTL&JKK)
2  (FML|AXY|AND)

answered Jul 16 '19 at 17:16

Erfan

40,971
8
66
78

Removing Partial Strings from Dataframe Column

2 Answers2