How to extract text inside the brackets in R?

Question

How can I extract all brackets which include a name AND a year?

string="testo(antonio.2018).testo(antonio).testo(giovanni,2018).testo(2018),testo(libero 2019)"

the desired output would look like this:

"(antonio.2018)" "(giovanni,2018)" "(libero 2019)"

I do not want to extract (2018) and (antonio)

What is the rule for extraction here? Please show what you tried to understand your problem. — Wiktor Stribiżew, Feb 23 '19 at 14:08
Possible duplicate of [Extract info inside all parenthesis in R](https://stackoverflow.com/questions/8613237/extract-info-inside-all-parenthesis-in-r) — Pushpesh Kumar Rajwanshi, Feb 24 '19 at 12:58

loki · Accepted Answer · 2019-02-23T16:09:46.930

You can use str_extract_all from the stringr package with this regex pattern:

stringr::str_extract_all(string, 
                         "\\(\\w+([[:punct:]]{1}|[[:blank:]]{1})[[:digit:]]+\\)")

# [[1]]
# [1] "(antonio.2018)"  "(giovanni,2018)" "(libero 2019)"

A small description of the regex:

\\w will match any word-character
+ means that it has to be matched at least once
[[:punct:]] will match any punctuation character
{1} will exactly one appearance
(....|....) indicates one pattern OR the other has to be met
[[:blank:]] means any whitespace must occur
[[:digit:]] means any digit must occur
\\( braces have to be exited.

score 3 · Answer 2 · answered Feb 23 '19 at 15:17

@loki answer is great! You can also try this, I hope this works for you :)

x<-regmatches(string, gregexpr("(?=\\().*?(?<=\\))", string, perl=T))[[1]]

>x

[1] "(antonio.2018)"  "(antonio)"       "(giovanni,2018)" "(2018)"          "(libero 2019)"  

#Extract every nth value. 
>x[seq_along(x) %% 2 > 0]
[1] "(antonio.2018)"  "(giovanni,2018)" "(libero 2019)"

Note: Unsure of your complete dataset (i.e. if the structure will always be in nth format. If it is (every 2nd value), this will work on large scale.

How to extract text inside the brackets in R?

2 Answers2