What is the prefered way to do this?
Using [
and a named key vector to recode another vector is, I thought until recently, a robust and preferred "R" idiom for performing a common task. Is there a better way I should be doing this?
Details about the task: I have a character vector that has a length of approx 1e6, each element being one char long strings. I want to convert this vector to be numeric, such that ("B", "H", "K", "M"), which are abbreviations for an order of magnitude (H = hundred, M = million, etc.) become numeric (H = 100, M = 1e6, etc.) Any other chars not in the set of 4, or NA
s, are to become 1
.
After much trial and error I've tracked it down to the fact that NA
s in the subsetting vector substantially slow down the operation. I find this inherently confusing, because it seems to me that subsetting with NA
should, if anything, be faster, because it doesnt even need to search through the subsetted vector, it only needs to return an NA.
y <- c("B", "H", "K", "M")
without_NA <- sample(rep_len(y, 1e6))
with_NA <- sample(rep_len(c(y, NA), 1e6))
convert_exponent_char_to_numeric <- function(exponent) {
exponent_key <- 10^c(2, 3*1:3)
names(exponent_key) <- c("H", "K", "M", "B")
out <- exponent_key[exponent]
out[is.na(out)] <- 1
out
}
system.time(convert_exponent_char_to_numeric(without_NA))
user system elapsed
0.136 0.011 0.147
system.time(convert_exponent_char_to_numeric(with_NA))
user system elapsed
303.342 0.691 304.237