Vectorised string operations are faster than creating and destroying objects in memory (see benchmarks below)
Solutions which create lists of vectors that you do not need tend to be relatively slow. You can use regular expressions here to replace everything up to and including the final -
.
sub(pattern = "^.+-", replacement = "", SKU)
# [1] "L" "XL" "XS" "S"
The caret (^
) is a regex metacharacter which matches the beginning of the string. The matches any character except a new line. The +
means "match the preceding character one or more times". The .+
combination is greedy, meaning it will find the longest match from the start to the end of the string. All together this means, match from the beginning of the string until and including the final -
.
The sub()
function replaces the first occurrence of the pattern
in x
(which in this case is SKU
) with the replacement
(which in this case is a blank string).
You can read more here about the syntax used in regular expressions.
Benchmarking
I benchmarked five approaches:
- Base R
sub()
.
- Base R
strsplit() |> sapply()
.
- Base R
strsplit() |> vapply()
.
stringr::str_split_i()
.
stringr::str_split() |> vapply(\(x) tail(x, 1), character(1))
.
- base R lookbehind:
regmatches(gregexpr()
.
stringr::str_extract()
lookbehind.
I repeated the vector from 10
to 1e5
times. sub()
is consistently the fastest approach with the least garbage collection (gc
), i.e. fewest memory allocations.
There is not much difference between base::strsplit()
and stringr::str_split()
. sapply
does not appear different to vapply()
. stringr::str_split_i()
is faster than the other approaches which split the vector, and has less garbage collection, but is not as fast as sub()
.
stringr::str_extract()
with a lookbehind is almost as fast as sub()
. Using the same pattern in base R with regmatches(gregexpr())
is much slower (presumably because it returns a list).

Code to generate the plot
results <- bench::press(
rep_num = rep_nums,
{
x <- rep(SKU, rep_num)
bench::mark(
min_iterations = 10,
sub = {
sub("^.+-", "", x)
},
strsplit_base_sapply = {
strsplit(x, "-") |>
sapply(tail, 1)
},
strsplit_base_vapply = {
strsplit(x, "-") |>
vapply(\(x) tail(x, 1), character(1))
},
str_split_i = {
str_split_i(x, "-", -1)
},
str_split_vapply = {
str_split(x, "-") |>
vapply(\(x) tail(x, 1), character(1))
},
base_r_lookbehind = {
regmatches(
x,
gregexpr("(?<=-)[^-]+$", x, perl = TRUE)
) |> unlist()
},
stringr_lookbehind = {
str_extract(x, "(?<=-)[^-]+$")
}
)
}
)
library(ggplot2)
autoplot(results) +
theme_bw() +
facet_wrap(vars(rep_num), scales = "free_x")