How can I remove inner parentheses from an R string?

Question

I am processing strings in R which are supposed to contain zero or one pair of parentheses. If there are nested parentheses I need to delete the inner pair. Here is an example where I need to delete the parentheses around big bent nachos but not the other/outer parentheses.

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

I know I can kill all the parentheses with the stringr package using str_remove_all():

test |>
  stringr::str_remove_all(stringr::fixed(")")) |> 
  stringr::str_remove_all(stringr::fixed("("))

but I don't have the RegEx skills to pick the inner parentheses. I found a SO post that is close but it removes the outer parentheses and I cant untangle it to remove the inner.

Do you also want to delete the content of the inner paranthesis? Also do you have just one inner paranthesis or multiple inner paranthesis? ie `here(a(b(c)))` do you want `here(a)` or `here(abc)`? — Onyambu, Nov 21 '22 at 23:23
I only need to delete the inner `(` and `)` characters. The content needs to remain. — itsMeInMiami, Nov 21 '22 at 23:46
So you need to have `here(abc)`?? Do you have at most one nested parenthesis or is it more than one nested parenthesis? — Onyambu, Nov 21 '22 at 23:49
A [lookahead](https://www.regular-expressions.info/lookaround.html) idea for multiple inside: [`$([^)(]*)$(?=(?:[^)(]*$[^)(]*$)*[^)(]*\))`](https://regex101.com/r/zgRkL5/1) (replace with `$1`) - [Recursive regex](https://www.rexegg.com/regex-recursion.html) if nested: [`(?:\G(?!^)|$)[^)(]*\K(\(((?>[^)(]+|(?1))*)$)`](https://regex101.com/r/zgRkL5/2) (replace with `$2`) - Use with `gsub` (`perl=T`) — bobble bubble, Nov 22 '22 at 05:19
@onyambu I think the example with Tacos above is as complex as it will get. — itsMeInMiami, Nov 22 '22 at 12:06
@bobblebubble That is wildly impressive. Did you use an app to help write that? I ask because I can't imagine ever having the skill to do that without serious help. — itsMeInMiami, Nov 22 '22 at 12:10
@itsMeInMiami Just love regexing :) I had previously put an answer but thought it's too much for the provided samples. However it could be useful in future or for someone with a similar task! Rephrased my answer a bit and restored it. — bobble bubble, Nov 22 '22 at 16:21

Josh White · Accepted Answer · 2022-11-22T04:19:21.657

3

Here you go.

test |>
  stringr::str_replace_all("(\\().*\\(", "\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner closed brackets

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (big bent nachos)"        
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

EDIT

Fixed my solution, so as to not lose text:

test |>
  stringr::str_replace("\\((.*)\\(", "(\\1") |> # remove inner open brackets
  stringr::str_remove_all("\\)(?=.*\\))") # remove inner outer brackets

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"

edited Nov 22 '22 at 04:19

answered Nov 21 '22 at 23:23

Josh White

1,003
1
17

FYI: "choice=Tacos " missing from item 3. – Dave2e Nov 21 '22 at 23:34
Oh, thanks @Dave2e, I didn't realise. See my solution now, I've fixed it. All in one step and rather simple :) Thanks. – Josh White Nov 22 '22 at 04:20
@JoshWhite Thank you for the Tidyverse friendly solution! Do you have a favorite website and/or book for learning RegEx? – itsMeInMiami Nov 22 '22 at 12:15
No sorry. For me it has just been time, the more I use it the better I get. So just keep practicing! – Josh White Nov 22 '22 at 14:04

bobble bubble · Answer 2 · 2022-11-23T07:23:19.363

Interested in how this would be solved with multiple (...) inside the outer parentheses, I came up with the following lookahead based idea. It only checks for an outer closing parentheses though.

test <- gsub("\\(([^)(]*)\\)(?=[^)(]*(?:\\([^)(]*\\)[^)(]*)*\\))", "\\1", test, perl=T)

See this R demo at tio.run or a pattern demo at regex101 (replace with \1, capture of first group)

The lookahead verifies at each (...) if only followed by (....) or non-parentheses up to ).

If there is even arbitrary nesting, flattening the first level could be solved by a recursive regex.

test <- gsub("(?:\\G(?!^)|\\()[^)(]*+\\K(\\(((?>[^)(]+|(?1))*)\\))", "\\2", test, perl=T)

One more R demo at tio.run or a regex101 demo (replace with \2, the second group's capture)

regex-part	explained
`(?:\G(?!^)\|\()`	Matches an opening bracket for chaining matches to by use of `\G`
`[^)(]*+\K`	Consumes any amount of non-parentheses and `\K` resets the beginning
`($((?>[^)(]+\|(?1))*)$)`	Matching the nested parentheses (explanation at php.net ↗). It contains two capture groups: • the first recurses at `(?1)` • the second captures `(`inside`)`.

Here the matches are chained to the opening parentheses. There is no check for an outer closing ). This \G based idea can be used without recursion too for just one level but is slightly less efficient.

score 1 · Answer 3 · answered Nov 21 '22 at 23:20

Assuming there be at most one nested parentheses, we could use a gsub() approach:

output <- gsub("\\(\\s*(.*?)\\s*\\(.*?\\)(.*?)\\s*\\)", "(\\1\\2)", test)
output

[1] "Record ID"                                       
[2] "What is the best food? (choice=Nachos)"          
[3] "What is the best food? (choice=Tacos)"           
[4] "What is the best food? (choice=Chips with stuff)"
[5] "Complete?"

Data:

test <- c(
  "Record ID", 
  "What is the best food? (choice=Nachos)", 
  "What is the best food? (choice=Tacos (big bent nachos))", 
  "What is the best food? (choice=Chips with stuff)", 
  "Complete?"
)

Tim, "I need to delete the inner pair" did not imply, **at least to me**, deleting the contents within the ( ). — Dave2e, Nov 21 '22 at 23:39

Dave2e · Answer 4 · 2022-11-21T23:35:17.050

1

Here is a solution using gsub from base R. It is broken down into 2 steps for readability and debugging.

test <- c(
   "Record ID", 
   "What is the best food? (choice=Nachos)", 
   "What is the best food? (choice=Tacos (big bent nachos))", 
   "What is the best food? (choice=Chips with stuff)", 
   "Complete?"
) 

test <- gsub("(\\(.*)\\(", "\\1", test)
# ( \\(.*  ) - first group starts with '(' then zero or more characters following that first '('
#  \\(       - middle part look of a another '('

#  "\\1" replace the found group with the part from the first group

test <-gsub("\\)(.*\\))", "\\1", test)
#similer to first part
test

[1] "Record ID"                                            
[2] "What is the best food? (choice=Nachos)"               
[3] "What is the best food? (choice=Tacos big bent nachos)"
[4] "What is the best food? (choice=Chips with stuff)"     
[5] "Complete?"

edited Nov 21 '22 at 23:35

answered Nov 21 '22 at 23:29

Dave2e

22,192
18
42
50

`I need to delete the inner pair` – Tim Biegeleisen Nov 21 '22 at 23:36
Thank you! That is a wonderful example to learn from. Can you offer advice for good places where to learn RegEx? I love books. – itsMeInMiami Nov 21 '22 at 23:54
1

@itsMeInMiami, Here is a complete tutorial: https://www.regular-expressions.info/tutorial.html – Dave2e Nov 21 '22 at 23:58
If you make it to Miami, lunch is on me. – itsMeInMiami Nov 22 '22 at 00:01

How can I remove inner parentheses from an R string?

4 Answers4

EDIT