0

I have a dataframe that have as its value either a string or a tuple containing multiple strings, like the one bellow:

           Country                                              Roles  \
0  Shell Record  (DSC Payroll Administrator Reporting, DSC HR S...   
1            PL  (DSC Payroll Administrator Reporting, DSC Payr...   
2            ES  (DSC HR Business Partner Reporting, DSC HR Bus...   
3  Shell Record  (DSC HR Business Partner Reporting, DSC HR Bus...   
4  Shell Record                     DSC BPM Worklist Administrator   

          Role vs Family  
0           Do not match  
1  (Match, Do not match)  
2                  Match  
3           Do not match  
4           Do not match  

Is there a way I can remove the values inside the tuple (for example, remove the Match/Do not match so the value in the column would just be the same without the parenthesis). I don't want to use "replace" for that (or even don't know if it is possible).

Thank you!

Paulo Cortez
  • 609
  • 4
  • 10
  • It can be done like it's done [here](https://stackoverflow.com/questions/20894525/how-to-remove-parentheses-and-all-data-within-using-pandas-python), however, they do use a regex `replace` – tidakdiinginkan Apr 18 '20 at 03:07
  • Does not work, it actually return NaN value instead of removing just the parenthesis – Paulo Cortez Apr 19 '20 at 17:32

1 Answers1

0

Sample dataframe:

import pandas as pd
import re
df = pd.DataFrame({'col': ['(Match, Do not match)', 'Match', 'Do not match']})
print(df)

Before:

                     col
0  (Match, Do not match)
1                  Match
2           Do not match

This regex expression should remove all parenthesis from the column.

df['col'] = df['col'].apply(lambda x: re.sub(r'[(|)]', '', x))
print(df)

After:

                   col
0  Match, Do not match
1                Match
2         Do not match
tidakdiinginkan
  • 918
  • 9
  • 18