2

I have some rows, some have parenthesis and some don't. Like ABC(DEF) and ABC. I want to extract info from parenthesis:

  • ABC(DEF) -> DEF
  • ABC -> NA

I wrote

gsub(".*\\((.*)\\).*", "\\1",X).

It works good for ABC(DEF), but output "ABC" when there is not parenthesis.

Zheyuan Li
  • 71,365
  • 17
  • 180
  • 248
Peiwen Yu
  • 21
  • 5

2 Answers2

0

If you do not want to get ABC when using sub with your regex, you need to add an alternative that would match all the non-empty string and remove it.

X <- c("ABC(DEF)", "ABC")
sub(".*(?:\\((.*)\\)).*|.*", "\\1",X)
                       ^^^

See the IDEONE demo.

Note you do not have to use gsub, you only need one replacement to be performed, so a sub will do.

Also, a stringr str_match would also be handy for this task:

str_match(X, "\\((.*)\\)")

or

str_match(X, "\\(([^()]*)\\)")
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

Using string_extract() will work.

library(stringr) 

df$`new column` <- str_extract(df$`existing column`,  "(?<=\\().+?(?=\\))")

This creates a new column of any text inside parentheses of an existing column. If there is no parentheses in the column, it will fill in NA.

The inspiration for my answer comes from this answer on the original question about this topic