1

I have two tables, ab and x, as defined below:

require(data.table)
ab=data.table(id=c("geneA", "geneB", "geneC", "geneA", "geneA", "geneB", ""), co1=c(1,2,3,0,7), co2=c(0,0,4,5,6), nontarget=c(9,0,7,6,5), co3=c(0,1,2,3,4))
target_col_nums=grep('co', colnames(ab))
x=ab

I'm trying to generate sums of each target column by the "id" column. When I do this for x, it appears that ab is also altered. What is going on here, and how do I avoid this?

x[,(target_col_nums):=lapply(.SD,sum),.SDcols=target_col_nums, by=id]
x

id co1 co2 nontarget co3 1: 
geneA   9  15         9   8 2 
geneB   3   8         0   2 3 
geneC   3   4         7   2 4 
geneA   9  15         6   8 5 
geneA   9  15         5   8 6 
geneB   3   8         9   2 7
        2   4         0   1

But, then this also happens:

ab
id co1 co2 nontarget co3 1: 
geneA   9  15         9   8 2 
geneB   3   8         0   2 3 
geneC   3   4         7   2 4 
geneA   9  15         6   8 5 
geneA   9  15         5   8 6 
geneB   3   8         9   2 7
        2   4         0   1
Atticus29
  • 4,190
  • 18
  • 47
  • 84
  • 2
    `data.table` uses pointers in order to avoid deep copies. If you want `ab` to stay the same create a deep copy using `copy`. – David Arenburg Mar 20 '16 at 20:19
  • 2
    See also [this](http://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another) – David Arenburg Mar 20 '16 at 20:24
  • 3
    Also please read the [vignettes](https://github.com/Rdatatable/data.table/wiki/Getting-started). – Arun Mar 20 '16 at 20:53

0 Answers0