-2

I am trying to remove certain words from a dataframe column and failing miserably...

Some of my sample data:

    Stock_Name
Vanguard US Government Bond Index GBP Inc (Hedged)
Vanguard US Government Bond Index GBP Acc (Hedged)
Vanguard US Government Bond Index GBP Inc
Vanguard US Government Bond Index USD Acc

The dictionary:

    replace_values = {
        r'\bAcc\b': "",
        r'\bInc\b': "",
        r'\b(Hedged)\b': "",
        r'\bGBP\b': "",
        r'\bUSD\b': ""
}
df["Stock_Name"] = df["Stock_Name"].replace(replace_values,regex=True)

The output I am getting:

Vanguard US Government Bond Index   ()
Vanguard US Government Bond Index   ()
Vanguard US Government Bond Index  
Vanguard US Government Bond Index  

for some reason the parentheses are being omitted. I have tried adding '()' to my replace values dict but it doesn't seem to do anything.

mrtn
  • 40
  • 7
  • To match a `(`, use `\(` in the pattern. Same with `)` and other [special regex metacharacters](https://stackoverflow.com/questions/399078/what-special-characters-must-be-escaped-in-regular-expressions). – Wiktor Stribiżew Jun 15 '20 at 12:25

1 Answers1

0

You should escape parentheses:

r"\(\bHedged\b\)": "",

Since \b means word boundary, it should be moved to the inside of parentheses. It won't match your text inside of it otherwise.

Baris
  • 397
  • 5
  • 12