R's implemention of PCRE (for lookahead/lookbehind) does not allow for variable reptition quantifiers (e.g., {2,}
); from ?regex
:
Patterns '(?=...)' and '(?!...)' are zero-width positive and
negative lookahead _assertions_: they match if an attempt to match
the '...' forward from the current position would succeed (or
not), but use up no characters in the string being processed.
Patterns '(?<=...)' and '(?<!...)' are the lookbehind equivalents:
they do not allow repetition quantifiers nor '\C' in '...'.
(the last line). For instance, we'll see:
gsub("(?<=\\s{2,})Quux", ",", exampleText, perl=TRUE)
# Warning in gsub("(?<=\\s{2,})Quux", ",", exampleText, perl = TRUE) :
# PCRE pattern compilation error
# 'lookbehind assertion is not fixed length'
# at '(?<=\s{2,})Quux'
but no such error if we change to "(?<=\\s{2})"
. As such, your lookaround expressions need to be fixed-width.
Some suggestions, both of these produce the desired results:
txt <- gsub("(?<=\\s{2})(\\S*)(?=\\s{2})", "\\1,", exampleText, perl=TRUE)
txt <- gsub("(?<=\\s\\s)(\\S*)(?=\\s\\s)", "\\1,", exampleText, perl=TRUE)
txt
# [1] "1 Building, Apartment, City"
You can fix the multi-spaces with a couple more patterns, if needed:
gsub("\\s+", " ", gsub(", ", ",", txt))
# [1] "1 Building,Apartment,City"
Since it looks as if you are creating comma-delimited text, though, most readers will optionally discard the surrounding blankspace:
txt
# [1] "1 Building, Apartment, City"
str(read.csv(text = txt, header = FALSE, strip.white = TRUE))
# 'data.frame': 1 obs. of 3 variables:
# $ V1: chr "1 Building"
# $ V2: chr "Apartment"
# $ V3: chr "City"