Conflicting/duplicate column names in J()?

Question

I have two data.tables (dat and results) that share column names. On a side note, results holds summary statistics computed earlier on *sub*groups of dat. In other words, nrow(results) != nrow(dat) (but I don't think this is relevant for the question) Now I want to incorporate these results back into dat (i.e. the original data.table) by adding a new column (i.e. NewColZ) to dat

This doesn't work as I expect:

dat[,list(colA,colB,NewColZ=results1[colX==colX & colY==colY,colZ])
   ,by=list(colX, colY)]

Why? because "colX" and "colY" are columns names in both data.tables (i.e. dat and results). What I want to say is, results1[take_from_self(colX)==take_from_parent(colX)]

Therefore the following works (observe I have only RENAMED the columns)

dat[,list(colA,colB,NewCol=results1[cx==colX & cy==colY,colZ,])
   ,by=list(colX, colY)]

Though I have a feeling that this can simply and easily be done by a join. But dat has many more columns than results

where is the parent `colX` and `colY` coming from. Also, have you read the FAQ? This is addressed there — Ricardo Saporta, May 12 '13 at 21:36
take_from_parent() simply refers to the dat data.table. Also, the FAQ talks has a question on the scoping rules of j , but here I looking for the scoping rules of i. — varuman, May 12 '13 at 22:06
-1, Could we have the/a data? What is it that you expect the code to do (in a clear manner)? And what is it doing instead (again, in a clear and concise manner)? This is a very poorly formatted question ATM. I'd be happy to up-vote after significant changes to this question. — Arun, May 12 '13 at 22:11
I think you've to read this post: [How to make a great reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) — Arun, May 12 '13 at 22:37
While `J()` might be useful for what the OP wants to do, I think it is again misused in the title of this post. Anyway, the OP might want to delete the question after learning more about data.table syntax. As-is, the q's very confusing. — Frank, May 12 '13 at 23:42

mnel · Answer 1 · 2013-05-12T23:04:56.630

What you are trying to do is a join on colX and colY. You can use := to assign by reference. Joining is most straightforward when you have unique combinations (which I am assuming you do)

keys <- c('colX', 'colY')
setkeyv(dat, keys)
setkeyv(results, keys)

dat[results, newcolZ := colZ]
# perhap use `i.` if there is a colZ in dat
# dat[results, newcolZ := i.colZ]

I do concur with the comments that suggest reading the FAQ and introduction vignettes as well as going through the many examples in ?data.table.

Your issue was a scoping issue, but your primary issue was not being fully aware of the data.table idioms. The join approach is the idoimatically data.table approach.

Conflicting/duplicate column names in J()?

1 Answers1