This one takes some typing but if you really do have a huge data set this may be the way to go. Bryangoodrich and Dason at talkstats.com taught me this one. It's using a hash table or creating a environment that contains a look up table. I actually keep this one on my .Rprofile (the hash function that is) for dictionary type look ups.
I replicated your data 1000 times to make it a bit larger.
#################################################
# THE HASH FUNCTION (CREATES A ENW ENVIRONMENT) #
#################################################
hash <- function(x, type = "character") {
e <- new.env(hash = TRUE, size = nrow(x), parent = emptyenv())
char <- function(col) assign(col[1], as.character(col[2]), envir = e)
num <- function(col) assign(col[1], as.numeric(col[2]), envir = e)
FUN <- if(type=="character") char else num
apply(x, 1, FUN)
return(e)
}
###################################
# YOUR DATA REPLICATED 1000 TIMES #
###################################
dat <- dat <- structure(list(product = c(11L, 11L, 9L, 9L, 6L, 1L, 11L, 5L,
7L, 11L, 5L, 11L, 4L, 3L, 10L, 7L, 10L, 5L, 9L, 8L)), .Names = "product", row.names = c(NA,
-20L), class = "data.frame")
dat <- dat[rep(seq_len(nrow(dat)), 1000), , drop=FALSE]
rownames(dat) <-NULL
dat
#########################
# CREATE A LOOKUP TABLE #
#########################
med.lookup <- data.frame(val=as.character(1:12),
med=rep(c('Tylenol', 'Advil', 'Bayer', 'Generic'), each=3))
########################################
# USE hash TO CREATE A ENW ENVIRONMENT #
########################################
meds <- hash(med.lookup)
##############################
# CREATE A RECODING FUNCTION #
##############################
recoder <- function(x){
x <- as.character(x) #turn the numbers to character
rc <- function(x){
if(exists(x, env = meds))get(x, e = meds) else NA
}
sapply(x, rc, USE.NAMES = FALSE)
}
#############
# HASH AWAY #
#############
recoder(dat[, 1])
In this case hashing is slow but if you have more levels to recode then it will increase in speed over others.