-2

I have a df and I want to add a column to this that extracts the data I need from another column. The column I am extracting from contains strings so I am guessing I need to use a regex or Re for this.

A simplified example of my df:

Column A    Column B                                
1           I want (this text) only                    
2           I only want (this) text          
3           that appears (in) the parentheses
4           but not every line has
5           (parentheses) in it

so I want my df to then look something like this:

Column A    Column B                            Column C                           
1           I want (this text) only              this
2           I only want (this) text              this
3           that appears (in) the parentheses    in
4           but not every line has
5           (parentheses) in it                  parentheses
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ed Jefferies
  • 165
  • 9

1 Answers1

2

If you want just the first word inside the parentheses, then use str.extract as follows:

df["C"] = df["B"].str.extract(r'\((\S+)')

If you want the full contents of the parentheses, then use:

df["C"] = df["B"].str.extract(r'\((.*?)\)')
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360