1

I'm trying to put together several files and need to do a bunch of merges on column names that are created inside a loop. I can do this fine using data.frame() but am having issues using similar code with a data.table():

library(data.table)

df1 <- data.frame(id = 1:20, col1 =  runif(20))
df2 <- data.frame(id = 1:20, col1 =  runif(20))

newColNum <- 5
newColName <- paste('col',newColNum ,sep='')

df1[,newColName] <- runif(20)

df2 <- merge(df2, df1[,c('id',newColName)], by = 'id', all.x = T) # Works fine
######################

dt1 <- data.table(id = 1:20, col1 =  runif(20))
dt2 <- data.table(id = 1:20, col1 =  runif(20))

newColNum <- 5
newColName <- paste('col',newColNum ,sep='')

dt1[,newColName] <- runif(20)

dt2 <- merge(dt2, dt1[,c('id',newColName)], by = 'id', all.x = T) # Doesn't work

Any suggestions?

screechOwl
  • 27,310
  • 61
  • 158
  • 267
  • a few suggestions - in `data.table`, assign by reference using `:=` instead of what you did, avoid using character strings and use column names directly, and use the `data.table` join instead of `merge` – eddi Nov 05 '13 at 19:54

2 Answers2

1

This really has nothing to do with merge(), and everything to do with how the j (i.e. column) index is, by default, interpreted by [.data.table().

You can make the whole statement work by setting with=FALSE, which causes the j index to be interpreted as it would be in a data.frame:

dt2 <- merge(dt2, dt1[,c('id',newColName), with=FALSE], by = 'id', all.x = T)
head(dt2, 3)
#    id      col1       col5
# 1:  1 0.4954940 0.07779748
# 2:  2 0.1498613 0.12707070
# 3:  3 0.8969374 0.66894157

More precisely, from ?data.table:

with: By default 'with=TRUE' and 'j' is evaluated within the frame of 'x'. The column names can be used as variables. When 'with=FALSE', 'j' is a vector of names or positions to select.

Note that this could be avoided by storing the columns in a variable like so:

cols = c('id', newColName)
dt1[ , ..cols]

.. signals to "look up one level"

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
0

Try dt1[,list(id,get(newColName))] in your merge.

eddi
  • 49,088
  • 6
  • 104
  • 155
TheComeOnMan
  • 12,535
  • 8
  • 39
  • 54
  • fyi, for many columns this is much worse than the `with=FALSE` solution, as the mere mention of "get", whips `data.table` into a frenzy and it constructs *all* the columns internally – eddi Nov 06 '13 at 06:24