Building on top of two questions I previously asked:
R: How to prevent memory overflow when using mgsub in vector mode?
I do like suggestions on usage of fixed=TRUE by @Tyler as it speeds up calculations significantly. However, it's not always applicable. I need to substitute, say, caps
as a stand-alone word w/ or w/o punctuation that surrounds it. A priori it's not know what can follow or precede the word, but it must be any of regular punctuation signs (, . ! - + etc). It cannot be a number or a letter. Example below. capsule
must stay as is.
i = "Here is the capsule, caps key, and two caps, or two caps. or even three caps-"
orig = "caps"
change = "cap"
gsub_FixedTrue <- function(i) {
i = paste0(" ", i, " ")
orig = paste0(" ", orig, " ")
change = paste0(" ", change, " ")
i = gsub(orig,change,i,fixed=TRUE)
i = gsub("^\\s|\\s$", "", i, perl=TRUE)
return(i)
}
#Second fastest, doesn't clog memory
gsub_FixedFalse <- function(i) {
i = gsub(paste0("\\b",orig,"\\b"),change,i)
return(i)
}
print(gsub_FixedTrue(i)) #wrong
print(gsub_FixedFalse(i)) #correct
Results. Second output is desired
[1] "Here is the capsule, cap key, and two caps, or two caps. or even three caps-"
[1] "Here is the capsule, cap key, and two cap, or two cap. or even three cap-"