In R, I am attempting to write code that will work on any adaptations of a string pattern. An example of a string is:
string <- "y ~ 1 + a + (b | c) + (d^2) + e + (1 | f) + g"
I would like to remove ONLY the portions that contain a pattern of "(, |, )" such as:
(b | c) and (1 | f)
and be left with:
"y ~ 1 + a + (d^2) + e + g"
Please note that the characters could change values (e.g., 'b' could become '1' and 'c' could become 'predictor') and I would like the code to still work. Spaces are also not required for the string, it could also be "y~1+a+(b|c)+(d^2)+e+(1|f)+g" or any combination of space/no-space thereof. The order of the characters could change as well to be "y~1+a+(b|c)+e+(1|f)+(d^2)+g".
I have tried using base R string manipulation functions (gsub and sub) to search for the pattern of "(, |, )" by using variations of the pattern such as:
"\\(.*\\|.*\\)"
"\\(.*\\|"
"\\(.+\\|.+\\)"
"\\|.+\\)"
as well as many of the stringr functions to find and replace this pattern with a blank. However, using both base R and stringr what happens when I do this is that it removes EVERYTHING, for example:
gsub("\\(.*\\|.*\\)", "", string)
produces:
"y ~ 1 + a + + g"
and
gsub("\\(.*\\|", "", string)
produces:
"y ~ 1 + a + f) + g"
I have additionally tried using the str_locate functions but am running into issues using that efficiently since there are multiple sets of parentheses and I want the locations only of the instances with a "|" between them.
Any help is greatly appreciated.