How to extract strings between multiple nested parentheses in R

Question

mystring <- c("code IS (k(384333)\n   AND parse = TURE \n ) \n 
              code IS (\n FROM (43343344)\n ) some information code IS \n
              code IS (  ( \n (data)(23423422 \n)) ) ) and more information)")

I would like to extract all instances of code IS (...). But because of the nested parentheses, my regex seems to stop only after the first closed parenthesis.

library(stringr)
> str_extract_all(pattern = 'code IS \\([\\s\\S]+?\\)', mystring)
[[1]]
[1] "code IS (k(384333)"          "code IS (\n FROM (43343344)" "code IS (  ( \n (data)"

The desired output is

[[1]]
[1] "code IS (k(384333)\n   AND parse = TURE \n )"          "code IS (\n FROM (43343344)\n )" "code IS (  ( \n (data)(23423422 \n)) )"

Edit: Potential regex solutions are here:

The question now is how do I adapt these solutions to work with str_extract_all in R?

My attempt at using a PCRE pattern:

> str_extract_all(pattern = 'code IS \((?:[^)(]+|(?R))*+\)', mystring)
Error: '\(' is an unrecognized escape in character string starting "'code IS \("

Not exactly. Which one should I use in R? I tried a handful but they all came back with errors. I'm lost on how exactly to adapt those answers to work with `str_extract_all` in R. — Adrian, Jun 07 '23 at 16:13
As far as I know, R supports both POSIX and PCRE. This means you need to use the PCRE part of the first answer and use a keyword argument like `perl = TRUE` or something similar (I don't know R, so I'm not sure about this part). — InSync, Jun 07 '23 at 16:17
Thanks, I gave it a shot and it didn't work out. I updated my original post with additional information. — Adrian, Jun 07 '23 at 16:20
https://ideone.com/kwaF0t - `regmatches(mystring, gregexpr("code IS (\\((?:[^()]++|(?1))*\\))", mystring, perl=TRUE))` — Wiktor Stribiżew, Jun 07 '23 at 16:41

How to extract strings between multiple nested parentheses in R

0 Answers0