I just did some benchmarking while trying to optimise some code and observed that strsplit
with perl=TRUE
is faster than running strsplit
with perl=FALSE
. For example,
set.seed(1)
ff <- function() paste(sample(10), collapse= " ")
xx <- replicate(1e5, ff())
system.time(t1 <- strsplit(xx, "[ ]"))
# user system elapsed
# 1.246 0.002 1.268
system.time(t2 <- strsplit(xx, "[ ]", perl=TRUE))
# user system elapsed
# 0.389 0.001 0.392
identical(t1, t2)
# [1] TRUE
So my question (or rather a variation of the question in the title) is, under what circumstances would be absolutely need perl=FALSE
(leaving out the fixed
and useBytes
parameters)? In other words, what can't we do using perl=TRUE
that can be done by setting perl=FALSE
?