I have a text with several parenthesis and I would like to extract the text from the 1st parenthesis e.g : in the string bellow I would like to get "int1"
string <- "string1(int1)string2(int2)string3(int3)"
I know nothing about regular expressions and my problem is that I don't know how to stop at the first "(" and ")", in the examples bellow when I match strictly the character, it stops at the 1st in the string (ofc using sub
and not gsub
). But when I use ".*" before my character it matchs the last occurence of it in the string.
sub("\\(", "X", string, perl = TRUE)
#[1] "string1Xint1)string2(int2)string3(int3)"
sub(".*\\(", "X", string, perl = TRUE)
#[1] "Xint3)"
sub(".*\\)", "X", string, perl = TRUE)
#[1] "X"
sub("\\)", "X", string, perl = TRUE)
#[1] "string1(int1Xstring2(int2)string3(int3)"
So when I do something like sub(".*\\((.*)\\).*", "\\1", string, perl = TRUE)
I got the string in the last parenthesis.
My first question is : How can I stop at the first "(" and ")" as in sub("\\)", ...)
?
After many tries I found a way to extract the string from the 1st parenthesis (which I'm not very sure to understand because of the grouping part with ()
) :
string %>%
sub("(\\).*$)", "\\2", ., perl = TRUE) %>% #[1] "string1(int1"
sub(".*\\(", "", ., perl = TRUE)
#[1] "int1"
Can you advise me a better solution?
And do you know where I can find a comprehensible document about R and Perl regexp, I learn some basics from https://www.cs.tut.fi/~jkorpela/perl/regexp.html and I'm looking for more examples.
Thank You.