1

I have a data set in Excel with a lot of vlookup formulas that I am trying to transpose in R using the data.table package.

In my example below I am saying, for each row, find the value in column y within column x and return the value in column z.

The first row results in na because the value 6 doesn't exist in column x.

On the second row the value 5 appears twice in column x but returning the first match is fine, which is e in this case

I've added in the result column which is the expected outcome.

library(data.table)
dt <- data.table(x = c(1,2,3,4,5,5), 
y = c(6,5,4,3,2,1), 
z = c("a", "b", "c", "d", "e", "f"), 
Result = c("na", "e", "d", "c", "b", "a"))

Many thanks

Frank
  • 66,179
  • 8
  • 96
  • 180
MidnightDataGeek
  • 938
  • 12
  • 21
  • http://stackoverflow.com/questions/18645222/vlookup-type-method-in-r , http://stackoverflow.com/questions/15303283/how-to-do-vlookup-and-fill-down-like-in-excel-in-r – jogo Apr 28 '17 at 19:46

2 Answers2

2

You can do this with a join, but need to change the order first:

setorder(dt, y)
dt[.(x = x, z = z), result1 := i.z, on = .("y" = x)]
setorder(dt, x)
#   x y z Result result1
#1: 1 6 a     na      NA
#2: 2 5 b      e       e
#3: 3 4 c      d       d
#4: 4 3 d      c       c
#5: 5 1 f      a       a
#6: 5 2 e      b       b

I haven't tested if this is faster than match for a big data.table, but it might be.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • 1
    Instead of sorting, there's `dt[, res := dt[.(y), on=.(x), z, mult="first"]]` – Frank Feb 28 '17 at 13:22
  • Just wondering what version of DT you are both using, I can't execute any of the above examples? I get `could not find function "."` for Rolands answer and an error on Frank's - `'on' argument should be a named atomic vector` I am using 1.9.6 still – MidnightDataGeek Feb 28 '17 at 14:02
  • 1
    @MidnightDataGeek Obviously the current CRAN version. 1.9.6 is old. – Roland Feb 28 '17 at 14:51
1

We can just use match to find the index of those matching elements of 'y' with that of 'x' and use that to index to get the corresponding 'z'

dt[, Result1 := z[match(y,x)]]
dt
#   x y z Result Result1
#1: 1 6 a     na      NA
#2: 2 5 b      e       e
#3: 3 4 c      d       d
#4: 4 3 d      c       c
#5: 5 2 e      b       b
#6: 5 1 f      a       a
akrun
  • 874,273
  • 37
  • 540
  • 662