1

I need to remove a special character set (eg (x)) for a single string with commas in it.

Here is an example of the objective:

col1 col2
Brash (7), Confident (7), Street-Smart (6), Calm/Peaceful(5) Brash, Confident, Street-Smart, Calm/Peaceful

I've tried using the following codes:

df['col'] = df['col'].fillna('').astype(str).str.replace(r'[^A-Za-z ]', '', regex=True)
df['col'] = df['col'].str.replace(r" \(.*\)","")

But I'm only able to keep the first element or remove all the special characters, and I just need to remove the pattern (x)

PSCM
  • 85
  • 11
  • your code ```df['col'] = df['col2'].str.replace(r" \(.*\)","")``` is working for me just fine. – sophocles Sep 13 '21 at 15:41
  • What do you mean by "element of a row"? Do you have `Brash (7), Confident (7), Street-Smart (6), Calm/Peaceful(5)` in a *single cell*? Is that a single string with commas in it, or a list of strings, or what? – Karl Knechtel Sep 13 '21 at 15:41
  • 1
    Use `r" \(.*?\)"` as regex – mozway Sep 13 '21 at 15:42
  • @KarlKnechtel, is a single string with commas in it. Sorry for the dubious expression – PSCM Sep 13 '21 at 15:48
  • Aha. Then the problem is simply that e.g. `\(.*\)` matches `(7), Confident (7), Street-Smart (6), Calm/Peaceful(5)` - *as much as possible* that satisfies the condition. See the linked duplicate for details. The answer you accepted works around the problem by only accepting digits within the parentheses. – Karl Knechtel Sep 13 '21 at 15:57

1 Answers1

1
df['col'].str.replace(r'\((\d+).*?\)', '', regex=True)
Muhammad Hassan
  • 4,079
  • 1
  • 13
  • 27