0

I need to join the information of column h of dataframe Y into dataframe X. The code below shows the desired output.

library(data.table)
X <- data.table(
  a1 = rep("A", 6), 
  b1 = rep(1,6), 
  c1 = rep(c(0,1), 1, each = 3),
  d = letters[1:6]
)

Y <- data.table(
  a2 = rep(c("A","B", "C"), 1, each = 2),
  b2 = rep(c(1, 2, 3), 1, each = 2),
  c2 = rep(c(0,1), 3),
  h = letters[7:12]
)


# final result

X[Y,
  on = .(a1 = a2, 
         b1 = b2, 
         c1 = c2),
  h := i.h
  ][]
#>    a1 b1 c1 d h
#> 1:  A  1  0 a g
#> 2:  A  1  0 b g
#> 3:  A  1  0 c g
#> 4:  A  1  1 d h
#> 5:  A  1  1 e h
#> 6:  A  1  1 f h

Created on 2020-08-03 by the reprex package (v0.3.0)

The problem, however, is that the names of the columns that I use for making the join vary depending on the information stored somewhere else. So, let's assume that the name of the column c1 in X is stored in var, say var <- "c2". Now, when I tried to do the join, nothing seems to work.

# None the attempts below works
var <- "c1"

# attempt 1
X[Y,
  on = .(a1 = a2, 
         b1 = b2, 
         eval(var) = c2),
  h := i.h
][]

# attempt 2
X[Y,
  on = .(a1 = a2, 
         b1 = b2, 
         get(var) = c2),
  h := i.h
][]

# attempt 3
cond    <- paste0(deparse(var), " = c2")
parcond <- parse(text = cond)

X[Y,
  on = .(a1 = a2, 
         b1 = b2, 
         eval(parcond)),
  h := i.h
][]

At the end, the only way I found to solve it is very inelegant, but it seems to be working.

var <- "c1"
setnames(X, var, "c2")

X[Y,
  on = c("a1" = "a2", 
         "b1" = "b2", 
         "c2"),
  h := i.h
][]

setnames(X, "c2", var)

However, I wonder if there is a better way to do this programmatically.

I checked all these links, but I could not find a solution that works for me.

Thank you so much for your help.

  • hi, you can construct your `on` as a char vector and pass it in. e.g. `onkey <- c("a1=a2", "b1=b2", paste0(deparse(var),"=c2")); X[Y, on=onkey, h := i.h]` – chinsoon12 Aug 03 '20 at 22:14
  • Thank you for your comment @chinsoon12. Unfortunately, it did not work. I got the following error: `Error in colnamesInt(x, names(on), check_dups = FALSE) : argument specifying columns specify non existing column(s): cols[1]='a1=a2'` – R.Andres Castaneda Aug 04 '20 at 14:49
  • 1
    sorry, it should be double equal sign i.e. `onkey <- c("a1==a2", "b1==b2", paste0(var,"==c2")); X[Y, on=onkey, h := i.h]` – chinsoon12 Aug 04 '20 at 21:53
  • Dear @chinsoon12, the solution worked beautifully! thank you so much. Should you create an answer so I can confirm as a valid answer? – R.Andres Castaneda Aug 06 '20 at 13:37
  • Pls feel free to answer – chinsoon12 Aug 06 '20 at 13:51

1 Answers1

0

Thanks to @chinsoon12 for his/her comment, the solution to the problem would be as follows,

library(data.table)
X <- data.table(
  a1 = rep("A", 6), 
  b1 = rep(1,6), 
  c1 = rep(c(0,1), 1, each = 3),
  d = letters[1:6]
  )

Y <- data.table(
  a2 = rep(c("A","B", "C"), 1, each = 2),
  b2 = rep(c(1, 2, 3), 1, each = 2),
  c2 = rep(c(0,1), 3),
  h = letters[7:12]
  )


var <- "c1"

onkey <- c("a1==a2", "b1==b2",  paste0(var,"==c2"))

X[Y, 
  on=onkey, 
  h := i.h
  ][]
#>    a1 b1 c1 d h
#> 1:  A  1  0 a g
#> 2:  A  1  0 b g
#> 3:  A  1  0 c g
#> 4:  A  1  1 d h
#> 5:  A  1  1 e h
#> 6:  A  1  1 f h

Created on 2020-08-11 by the reprex package (v0.3.0)