3

I am processing strings in R which are supposed to contain zero or one pair of parentheses. If there are nested parentheses I need to delete the inner pair. Here is an example where I need to delete the parentheses around big bent nachos but not the other/outer parentheses.

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
) 

I know I can kill all the parentheses with the stringr package using str_remove_all():

test |>
  stringr::str_remove_all(stringr::fixed(")")) |> 
  stringr::str_remove_all(stringr::fixed("("))

but I don't have the RegEx skills to pick the inner parentheses. I found a SO post that is close but it removes the outer parentheses and I cant untangle it to remove the inner.

Dave2e
  • 22,192
  • 18
  • 42
  • 50
itsMeInMiami
  • 2,324
  • 1
  • 13
  • 34
  • 1
    Do you also want to delete the content of the inner paranthesis? Also do you have just one inner paranthesis or multiple inner paranthesis? ie `here(a(b(c)))` do you want `here(a)` or `here(abc)`? – Onyambu Nov 21 '22 at 23:23
  • 1
    I only need to delete the inner `(` and `)` characters. The content needs to remain. – itsMeInMiami Nov 21 '22 at 23:46
  • So you need to have `here(abc)`?? Do you have at most one nested parenthesis or is it more than one nested parenthesis? – Onyambu Nov 21 '22 at 23:49
  • 1
    A [lookahead](https://www.regular-expressions.info/lookaround.html) idea for multiple inside: [`\(([^)(]*)\)(?=(?:[^)(]*\([^)(]*\))*[^)(]*\))`](https://regex101.com/r/zgRkL5/1) (replace with `$1`) - [Recursive regex](https://www.rexegg.com/regex-recursion.html) if nested: [`(?:\G(?!^)|\()[^)(]*\K(\(((?>[^)(]+|(?1))*)\))`](https://regex101.com/r/zgRkL5/2) (replace with `$2`) - Use with `gsub` (`perl=T`) – bobble bubble Nov 22 '22 at 05:19
  • @onyambu I think the example with Tacos above is as complex as it will get. – itsMeInMiami Nov 22 '22 at 12:06
  • @bobblebubble That is wildly impressive. Did you use an app to help write that? I ask because I can't imagine ever having the skill to do that without serious help. – itsMeInMiami Nov 22 '22 at 12:10
  • @itsMeInMiami Just love regexing :) I had previously put an answer but thought it's too much for the provided samples. However it could be useful in future or for someone with a similar task! Rephrased my answer a bit and restored it. – bobble bubble Nov 22 '22 at 16:21

4 Answers4

3

Here you go.

test |>
  stringr::str_replace_all("(\\().*\\(", "\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner closed brackets
[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (big bent nachos)"        
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

EDIT

Fixed my solution, so as to not lose text:

test |>
  stringr::str_replace("\\((.*)\\(", "(\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner outer brackets
[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?" 
Josh White
  • 1,003
  • 1
  • 17
2

Interested in how this would be solved with multiple (...) inside the outer parentheses, I came up with the following lookahead based idea. It only checks for an outer closing parentheses though.

test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)

See this R demo at tio.run or a pattern demo at regex101 (replace with \1, capture of first group)

The lookahead verifies at each (...) if only followed by (....) or non-parentheses up to ).


If there is even arbitrary nesting, flattening the first level could be solved by a recursive regex.

test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)

One more R demo at tio.run or a regex101 demo (replace with \2, the second group's capture)

regex-part explained
(?:\G(?!^)|\() Matches an opening bracket for chaining matches to by use of \G
[^)(]*+\K Consumes any amount of non-parentheses and \K resets the beginning
(\(((?>[^)(]+|(?1))*)\)) Matching the nested parentheses (explanation at php.net ↗).
It contains two capture groups:
• the first recurses at (?1)
• the second captures (inside).

Here the matches are chained to the opening parentheses. There is no check for an outer closing ). This \G based idea can be used without recursion too for just one level but is slightly less efficient.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
1

Assuming there be at most one nested parentheses, we could use a gsub() approach:

output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (choice=Tacos)"           
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

Data:

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Tim, "I need to delete the inner pair" did not imply, **at least to me**, deleting the contents within the ( ). – Dave2e Nov 21 '22 at 23:39
1

Here is a solution using gsub from base R. It is broken down into 2 steps for readability and debugging.

test <- c(
   "Record ID", 
   "What is the best food? (choice=Nachos)", 
   "What is the best food? (choice=Tacos (big bent nachos))", 
   "What is the best food? (choice=Chips with stuff)", 
   "Complete?"
) 

test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.*  ) - first group starts with '(' then zero or more characters following that first '('
#  \\(       - middle part look of a another '('

#  "\\1" replace the found group with the part from the first group

test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"  
Dave2e
  • 22,192
  • 18
  • 42
  • 50