Your code isn't quite correct:
- note that the
match(key[, 2], input)
in the index on the LHS is of length 6 (the length of key
) not 5 (the length of input
), and so !is.na()
is of length 6 not 5, and which(!is.na())
is an index into key
, not into input
.
- you additionally lose the order of the matches. by using
!is.na()
on the right hand side (it works in your example because the rows of key
happen to be the same indices as the things to replace in input
, and in the same order).
As an illustrative example, let's shuffle your key
key <- key[c(3,2,4,5,6,1), ]
input[which(!is.na(match(key[,2],input)))]<-key[!is.na(match(key[,2],input)),1]
input
[1] "one1" "one" "three" "four" "five" "one"
Note how your new input
has 6 variables now, and the first one1
wasn't replaced. Have a look at match(key[,2], input)
, is.na(...)
and which(is.na(...))
to see why.
You need to use match(input, key[,2])
which is non-NA when input[i]
has a match in key
, and has the value of the index into key
. So now you can use !is.na()
on the LHS to do the assignment, but don't use !is.na()
on the right or you lose the indices of the matches in key
.
m <- match(input, key[,2]) # 6 2 NA NA NA for the shuffled `key`
input[!is.na(m)] <- key[na.omit(m), 1]
# or a one-liner
input[!is.na(match(input, key[,2]))] <- key[na.omit(match(input, key[,2])), 1]
In terms of "more efficient", I reckon this is about as good as it gets - merge
calls match
internally anyway, so will most certainly be slower. It ain't "elegant", but it's fast.
The only improvement I see is to store the match first (like I have done above, storing the match in m
) to avoid calling it twice.