I would like to clean up the phone number column in my pandas dataframe. I'm using below code but it leaves a bracket at the end. How do I get the right regex to exclude any extra characters in the end like (, or anything which is not part of phone number. I've looked through old posts, but can't seem to find exact solution. sample code below :
import pandas as pd
df1 = pd.DataFrame({'x': ['1234567890', '202-456-3456', '(202)-456-3456adsd', '(202)-456- 4567', '1234564567(dads)']})
df1['x1'] = df1['x'].str.extract('([\(\)\s\d\-]+)',expand= True)
expected output:
x x1
0 1234567890 1234567890
1 202-456-3456 202-456-3456
2 (202)-456-3456adsd (202)-456-3456
3 (202)-456- 4567 (202)-456- 4567
4 1234564567(dads) 1234564567
Current output :
x x1
0 1234567890 1234567890
1 202-456-3456 202-456-3456
2 (202)-456-3456adsd (202)-456-3456
3 (202)-456- 4567 (202)-456- 4567
4 1234564567(dads) 1234564567(