1
mystring <- c("code IS (k(384333)\n   AND parse = TURE \n ) \n 
              code IS (\n FROM (43343344)\n ) some information code IS \n
              code IS (  ( \n (data)(23423422 \n)) ) ) and more information)")

I would like to extract all instances of code IS (...). But because of the nested parentheses, my regex seems to stop only after the first closed parenthesis.

library(stringr)
> str_extract_all(pattern = 'code IS \\([\\s\\S]+?\\)', mystring)
[[1]]
[1] "code IS (k(384333)"          "code IS (\n FROM (43343344)" "code IS (  ( \n (data)"    

The desired output is

[[1]]
[1] "code IS (k(384333)\n   AND parse = TURE \n )"          "code IS (\n FROM (43343344)\n )" "code IS (  ( \n (data)(23423422 \n)) )" 

Edit: Potential regex solutions are here:

The question now is how do I adapt these solutions to work with str_extract_all in R?

My attempt at using a PCRE pattern:

> str_extract_all(pattern = 'code IS \((?:[^)(]+|(?R))*+\)', mystring)
Error: '\(' is an unrecognized escape in character string starting "'code IS \(" 
Adrian
  • 9,229
  • 24
  • 74
  • 132
  • Not exactly. Which one should I use in R? I tried a handful but they all came back with errors. I'm lost on how exactly to adapt those answers to work with `str_extract_all` in R. – Adrian Jun 07 '23 at 16:13
  • As far as I know, R supports both POSIX and PCRE. This means you need to use the PCRE part of the first answer and use a keyword argument like `perl = TRUE` or something similar (I don't know R, so I'm not sure about this part). – InSync Jun 07 '23 at 16:17
  • Thanks, I gave it a shot and it didn't work out. I updated my original post with additional information. – Adrian Jun 07 '23 at 16:20
  • 1
    https://ideone.com/kwaF0t - `regmatches(mystring, gregexpr("code IS (\\((?:[^()]++|(?1))*\\))", mystring, perl=TRUE))` – Wiktor Stribiżew Jun 07 '23 at 16:41

0 Answers0