0

I am confused on how to extract the number within the parentheses in a “character” vector with R

This is the case

Data <- c("182 (450)", "7,736,000 (19,120,000)", "350 (860)")

And this is the output that I am trying to look for,

>   450,     19120000,   860

I am still searching for possible function and approach for this type of problem. Any help and suggestion would be appreciated

Thank you

1 Answers1

2

We can use subto extract everything between parentheses.

result <- sub('.*\\((.*)\\).*', '\\1', Data)
result
#[1] "450"        "19,120,000" "860"

Characters like (()*&$) have special meaning in regex so when we want to match them literally we need to escape them. Escaping is done by using \\ in R. So here we first escape the opening parentheses (\\(), create a capture group ((.*)) to capture everything between opening parentheses and closing one (\\)) which we escape again. We use backreference \\1 to return the expression captured in the capture group.

and same using str_extract

result <- stringr::str_extract(Data, "(?<=\\().*(?=\\))")

If you want to convert data to numeric use :

as.numeric(gsub(',', '', result))
#OR
#readr::parse_number(result)
#[1]      450 19120000      860
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much. It seems I still do not know what the meaning of these symbols '.*\\((.*)\\).*', '\\1' Do you have any reference or topic in which I could learn these symbols definition/ mean – Fadel Erwin May 06 '20 at 01:20
  • @FadelErwin I added some explanation in the answer you can start learning more about regular expressions here - https://www.regular-expressions.info/ – Ronak Shah May 06 '20 at 01:27