We are looking for a blazing fast solution to the following problem, in R (Rcpp is allowed).
I have a character vector:
set.seed(42)
x <- sample(LETTERS[1:4], 1e6, replace = TRUE)
And I want to change it to a non sequential numeric vector, where:
A = 5
B = 4
C = 3
D = 1
For example:
c("A", "B", "C", "D")
Would be:
c(5,4,3,1)
The interns and I have what we think is a ridiculously fast solution already but we think the Internet can beat us. We'll add our fastest solution as an answer after we get some responses.
Let's see!
Timings so far:
library(microbenchmark)
set.seed(42)
x <- sample(LETTERS[1:4], 1e6, replace = TRUE)
richscriven <- function(x) {
as.vector(c(A=5, B=4, C=3, D=2, E=1)[x])
}
richscriven_unname <- function(x) {
unname(c(A=5, B=4, C=3, D=2, E=1)[x])
}
richscriven_op <- function(x) {
(5:1)[c(factor(x))]
}
op_and_interns_fun <- function(x) {
c(5,4,3,1)[as.numeric(as.factor(x))]
}
ronakshah <- function(x) {
vec = c("A" = 5, "B" = 4, "C" = 3, "D" = 1)
unname(vec[match(x, names(vec))])
}
microbenchmark(
richscriven_unname(x),
richscriven(x),
richscriven_op(x),
op_and_interns_fun(x),
ronakshah(x),
times = 15
)
Unit: milliseconds
expr min lq mean median uq max neval
richscriven_unname(x) 36.06018 38.01026 62.80854 38.87179 41.86411 337.65773 15
richscriven(x) 37.90615 41.61194 43.50555 44.14130 45.17277 47.47804 15
richscriven_op(x) 31.70345 37.43262 44.10522 41.34828 45.22127 88.79605 15
op_and_interns_fun(x) 40.18935 44.20475 49.48811 45.77867 48.15706 99.85034 15
ronakshah(x) 29.36408 32.52615 42.40753 35.09052 38.55763 95.78571 15