1

I have two data.tables (dat and results) that share column names. On a side note, results holds summary statistics computed earlier on *sub*groups of dat. In other words, nrow(results) != nrow(dat) (but I don't think this is relevant for the question) Now I want to incorporate these results back into dat (i.e. the original data.table) by adding a new column (i.e. NewColZ) to dat

This doesn't work as I expect:

dat[,list(colA,colB,NewColZ=results1[colX==colX & colY==colY,colZ])
   ,by=list(colX, colY)]

Why? because "colX" and "colY" are columns names in both data.tables (i.e. dat and results). What I want to say is, results1[take_from_self(colX)==take_from_parent(colX)]

Therefore the following works (observe I have only RENAMED the columns)

dat[,list(colA,colB,NewCol=results1[cx==colX & cy==colY,colZ,])
   ,by=list(colX, colY)]

Though I have a feeling that this can simply and easily be done by a join. But dat has many more columns than results

varuman
  • 81
  • 5
  • where is the parent `colX` and `colY` coming from. Also, have you read the FAQ? This is addressed there – Ricardo Saporta May 12 '13 at 21:36
  • take_from_parent() simply refers to the dat data.table. Also, the FAQ talks has a question on the scoping rules of j , but here I looking for the scoping rules of i. – varuman May 12 '13 at 22:06
  • 1
    -1, Could we have the/a data? What is it that you expect the code to do (in a clear manner)? And what is it doing instead (again, in a clear and concise manner)? This is a very poorly formatted question ATM. I'd be happy to up-vote after significant changes to this question. – Arun May 12 '13 at 22:11
  • 1
    I think you've to read this post: [How to make a great reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Arun May 12 '13 at 22:37
  • While `J()` might be useful for what the OP wants to do, I think it is again misused in the title of this post. Anyway, the OP might want to delete the question after learning more about data.table syntax. As-is, the q's very confusing. – Frank May 12 '13 at 23:42

1 Answers1

1

What you are trying to do is a join on colX and colY. You can use := to assign by reference. Joining is most straightforward when you have unique combinations (which I am assuming you do)

keys <- c('colX', 'colY')
setkeyv(dat, keys)
setkeyv(results, keys)

dat[results, newcolZ := colZ]
# perhap use `i.` if there is a colZ in dat
# dat[results, newcolZ := i.colZ]

I do concur with the comments that suggest reading the FAQ and introduction vignettes as well as going through the many examples in ?data.table.

Your issue was a scoping issue, but your primary issue was not being fully aware of the data.table idioms. The join approach is the idoimatically data.table approach.

mnel
  • 113,303
  • 27
  • 265
  • 254