1

How can I extract all brackets which include a name AND a year?

string="testo(antonio.2018).testo(antonio).testo(giovanni,2018).testo(2018),testo(libero 2019)"

the desired output would look like this:

"(antonio.2018)" "(giovanni,2018)" "(libero 2019)"

I do not want to extract (2018) and (antonio)

loki
  • 9,816
  • 7
  • 56
  • 82
libero
  • 31
  • 2

2 Answers2

5

You can use str_extract_all from the stringr package with this regex pattern:

stringr::str_extract_all(string, 
                         "\\(\\w+([[:punct:]]{1}|[[:blank:]]{1})[[:digit:]]+\\)")

# [[1]]
# [1] "(antonio.2018)"  "(giovanni,2018)" "(libero 2019)"  

A small description of the regex:

\\w will match any word-character
+ means that it has to be matched at least once
[[:punct:]] will match any punctuation character
{1} will exactly one appearance
(....|....) indicates one pattern OR the other has to be met
[[:blank:]] means any whitespace must occur
[[:digit:]] means any digit must occur
\\( braces have to be exited.

loki
  • 9,816
  • 7
  • 56
  • 82
3

@loki answer is great! You can also try this, I hope this works for you :)

x<-regmatches(string, gregexpr("(?=\\().*?(?<=\\))", string, perl=T))[[1]]

>x

[1] "(antonio.2018)"  "(antonio)"       "(giovanni,2018)" "(2018)"          "(libero 2019)"  

#Extract every nth value. 
>x[seq_along(x) %% 2 > 0]
[1] "(antonio.2018)"  "(giovanni,2018)" "(libero 2019)"  

Note: Unsure of your complete dataset (i.e. if the structure will always be in nth format. If it is (every 2nd value), this will work on large scale.

OctoCatKnows
  • 399
  • 3
  • 17