1) gsubfn This can be done without complex regular expressions using gsubfn. The regular expression consisting of a dot matches a single character. Then for each string in the input character vector the pre
function initializes the counter k
to 0 for then for each match fun
is run with that character passed to it via the x
argument. Within fun
the counter k
is incremented by 1 each time a (
is encountered and decremented by 1 each time a )
is encountered. If the counter is not zero and a comma is encountered a semicolon is returned to replace the comma; otherwise, the input character is returned. This is vectorized, i.e. it also works if the input s
is a character vector in which each component should be processed separately.
library(gsubfn)
p <- proto(k = 0,
pre = function(this) this$k <- 0,
fun = function(this, x) {
if (x == "(") this$k <- k + 1
if (x == ")") this$k <- k - 1
if (k && x == ",") ";" else x
})
gsubfn(".", p, s)
giving:
[1] "Oats (24%) (Rolled; Bran), Coconut (13%) (Coconut ; Preservative (220; 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame ; Sunflower), Margarine (Vegetable Oil; Water; Salt; Emulsifiers (471; Soy Lecithin); Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar; Vegetable Oil; Milk Solids; Cocoa Powder; Emulsifiers (Soy Lecithin; 492); Natural Flavour), Natural Flavour"
2) Base R A base R solution is to split the input into single characters giving a list of character vectors, L. Then for each component, chars
, of L
create a counter vector, k
, the same length as chars
which indicates the number of (
to that point minus the number of )
to that point. Then replace those commas corresponding to a nonzero k
with semicolon and transform chars
back to a single string. Like (1) this works on character vectors.
L <- strsplit(s, "")
sapply(L, function(chars) {
k <- cumsum((chars == "(") - (chars == ")"))
chars[k & chars == ","] <- ";"
paste(chars, collapse = "")
})
Note
Input string s is the following.
s <- "Oats (24%) (Rolled, Bran), Coconut (13%) (Coconut , Preservative (220, 223)), Brown Sugar, Milk Solids, Golden Syrup (10%), Seeds (9%) (Sesame , Sunflower), Margarine (Vegetable Oil, Water, Salt, Emulsifiers (471, Soy Lecithin), Antioxidant (307)), Glucose, Milk Choc Compound (5%) (Sugar, Vegetable Oil, Milk Solids, Cocoa Powder, Emulsifiers (Soy Lecithin, 492), Natural Flavour), Natural Flavour"