Suppose I have a dataframe:
sick <- c("daa12", "daa13", "daa14", "daa15", "daa16", "daa17")
code <- c("heart", "heart", "lung", "lung", "cancer", "cancer")
sick_code <- data.frame(sick, code)
And another:
pid <- abs(round(rnorm(6)*1000,0))
sick <- c("-" , "-", "-", "-", "daa16", "SO")
p_sick <- data.frame(pid, sick)
Now i would like to add a new varialbe to p_sick, that "translates" p_sick$sick to sick_code$code. The variable in p_sick$sick is a string which may or may not be p_sick$sick in this case NA should be returned.
Now I could write for loop with a simple ifelse statement. But the data I have is 150million rows long, and the translate table is 15.000 long.
I have googled that this is the equalivalent of a "proc format" in SaS (but I do not have acces to SaS, nor do I have any idea how it works).
Perhaps some variant of merge in plyr, or an apply function?
EDIT: I have accepted both answer, since they work. I will try and look into the difference (in speed) between the two. Since merge is a built in function I am guessing it does lots of checking.
EDIT2: To people getting here by Google; merge has and sort = FALSE which will speed things up. Note that the order is not preserved in any way.