3

I'm using regular expressions in R. I am trying to pick out parenthesized content that is at the end of some strings in a character vector. I'm able to find parenthesized content when it exists, but I'm failing to excluded non-parenthesized content in inputs that don't have parens.

Example:

> x <- c("DECIMAL", "DECIMAL(14,5)", "RAND(1)")
> gsub("(.*?)(\\(.*\\))", "\\2", x)
[1] "DECIMAL" "(14,5)"  "(1)"

The last 2 elements in output are correct, the first one is not. I want

c("", "(14,5)", "(1)")

The input can have anything, literally any word or number characters, before the parenthesized content.

pauljohn32
  • 2,079
  • 21
  • 28

2 Answers2

1

We can use str_extract or regmatches

library(stringr)
library(tidyr)
replace_na(str_extract(x, "\\([^)]+\\)"), "")
#[1] ""       "(14,5)" "(1)"  

With sub/gsub if the pattern is not matched, it returns the whole string

akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I trust it works. I don't want install tidyverse just for this purpose. I'm avoiding it wherever possible. – pauljohn32 Jan 18 '21 at 18:40
1

You can use

sub("^.*?(\\(.*\\))?$", "\\1", x, perl=TRUE)

See the regex demo. Details:

  • ^ - start of string
  • .*? - any zero or more chars other than line break chars (since it is a PCRE regex, see perl=TRUE) as few as possible
  • (\\(.*\\))? - an optional Group 1: a (, then any zero or more chars other than line break chars, as many as possible, and then a )
  • $ - end of string.

See the R demo:

x <- c("DECIMAL", "DECIMAL(14,5)", "RAND(1)")
sub("^.*?(\\(.*\\))?$", "\\1", x, perl=TRUE)
## => [1] ""       "(14,5)" "(1)" 

NOTE: perl=TRUE is very important in this case because the two parts in the regex have quantifiers of different greediness.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I missed the `perl=TRUE` part while I was trying to solve on my own. That is the critical part! Thanks for the explanation. – pauljohn32 Jan 18 '21 at 18:42
  • @pauljohn32 Yes, that is really crazy about the TRE regex, you may read more on that in [this answer of mine](https://stackoverflow.com/a/62021015/3832970). – Wiktor Stribiżew Jan 18 '21 at 18:47