I have a column in a df contains the following values:
>>> import pandas as pd
>>> df = pd.DataFrame({'Sentence':['his is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm', 'I have researched the product KEY_abc_def, and KEY_blt_chm as requested', 'He got the idea from your message KEY_mno_pqr']})
>>> df
Sentence
0 This is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm
1 I have researched the product KEY_abc_def, and KEY_blt_chm as requested
2 He got the idea from your message KEY_mno_pqr
I would like to use regex to extract the KEY into a new column without the actual "KEY_". For those sentences have more than 1 KEY, they should be joined with a comma. The output should be as below:
>>> df
Sentence KEY
0 This is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm abc_def, mno_pqr, blt_chm
1 I have researched the product KEY_abc_def, and KEY_blt_chm as requested abc_def, blt_chm
2 He got the idea from your message KEY_mno_pqr mno_pqr
I tried with this code but it is not working. Any suggestions would greatly be appreciated.
The code that I currently have only worked with the first KEY, and ignored the rest. I'm new with regex so any suggestions would be highly appreciated.
df['KEY']= df.sentence.str.extract("KEY_(\w+)", expand=True)