9

Let's say I have a data.table

set.seed(1) # to make the example reproducible
ex<-data.table(AAA=runif(100000),
               BBB=runif(100000),
               CCC=runif(100000),
               DDD=runif(100000),
               FLAG=c(rep(c("a","b","c","d","e"),200000)))

I want to subtract from column AAA every other column, then from BBB every remaining column (except FLAG) and so on so that the output will look like...

ex[,list(AAA_BBB=AAA-BBB,
         AAA_CCC=AAA-CCC,
         AAA_DDD=AAA-DDD,
         BBB_CCC=BBB-CCC,
         BBB_DDD=BBB-DDD,
         CCC_DDD=CCC-DDD)]

Is there a data.table syntax that can do this cleanly without knowing how many columns there are or what their names are?

Dean MacGregor
  • 11,847
  • 9
  • 34
  • 72
  • Dealing with the 1000000 rows in your example is not much fun. Efficiency wasn't mentioned in your question...is it a concern? Anyway, I only tried my answer with ten rows. – Frank Jun 04 '13 at 15:32
  • 1
    @Frank I believe the need for efficiency is pretty much implied if data.table is used. – Roland Jun 04 '13 at 15:37
  • @Frank sorry about the excessive rows in the example I just copy-pasted a previous question that I asked where the large number of rows was pertinent to the example. – Dean MacGregor Jun 04 '13 at 16:35
  • Ah, no problem. My solution was both inefficient and wordy. I didn't know about this nice `combn` function and wouldn't have realized how much better it is without your big example data. :) Also, Roland is right: I should assume a data.table question is concerned with efficiency. – Frank Jun 04 '13 at 17:10

2 Answers2

5

A solution with combn and apply:

cc <- combn(colnames(ex)[1:4], 2)
apply(cc, 2, function(x)ex[[x[1]]]-ex[[x[2]]])

gives for the first 5 rows:

             [,1]         [,2]       [,3]        [,4]        [,5]         [,6]
 [1,] -0.43500930 -0.520148152  0.1602265 -0.08513885  0.59523580  0.680374655
 [2,] -0.32964090 -0.153303302 -0.3807295  0.17633760 -0.05108855 -0.227426149
 [3,]  0.25991705 -0.079679566  0.2040904 -0.33959662 -0.05582670  0.283769917
 [4,]  0.35585252  0.153083047  0.2382553 -0.20276948 -0.11759719  0.085172292
 [5,] -0.67081018 -0.116543468 -0.3413471  0.55426671  0.32946305 -0.224803663

Edit

As Arun suggested, combn can take a function argument, so a better solution is

res <- combn(colnames(ex)[1:4], 2, function(x) ex[[x[1]]] - ex[[x[2]]])
colnames(res) <- combn(colnames(ex)[1:4], 2, paste, collapse="_")
as.data.table(res)

            AAA_BBB     AAA_CCC     AAA_DDD     BBB_CCC     BBB_DDD     CCC_DDD
      1: -0.4350093 -0.52014815  0.16022650 -0.08513885  0.59523580  0.68037465
      2: -0.3296409 -0.15330330 -0.38072945  0.17633760 -0.05108855 -0.22742615
      3:  0.2599171 -0.07967957  0.20409035 -0.33959662 -0.05582670  0.28376992
      4:  0.3558525  0.15308305  0.23825534 -0.20276948 -0.11759719  0.08517229
      5: -0.6708102 -0.11654347 -0.34134713  0.55426671  0.32946305 -0.22480366
     ---                                                                       
 999996: -0.8450458 -0.47951267 -0.30333929  0.36553310  0.54170648  0.17617338
 999997: -0.5778393 -0.01784418 -0.24353237  0.55999516  0.33430697 -0.22568819
 999998:  0.7127352  0.82554276  0.01258673  0.11280758 -0.70014846 -0.81295604
 999999: -0.6693544 -0.42335069 -0.81080852  0.24600375 -0.14145408 -0.38745783
1000000: -0.8511655 -0.23341818 -0.15830584  0.61774732  0.69285966  0.07511234
user1981275
  • 13,002
  • 8
  • 72
  • 101
  • 3
    (+1) `combn` has a function argument as well: `combn(colnames(ex)[1:4], 2, function(x) ex[[x[1]]] - ex[[x[2]]])` – Arun Jun 04 '13 at 15:14
5

Looping over the combinations within data.table:

comblist <- combn(names(ex)[-5],2,FUN=list)
res2 <- ex[,lapply(comblist,function(x) get(x[1])-get(x[2]))]

setnames(res2,names(res2),sapply(comblist,paste,collapse="_"))
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Is there a way to keep `FLAG` in `res2` or do I have to add back in later? I tried `list(lapply(comblist,function(x) get(x[1])-get(x[2])),FLAG)]` but that didn't work. – Dean MacGregor Jun 04 '13 at 17:06
  • You could use `ex[,sapply(comblist,paste,collapse="_"):=lapply(comblist,function(x) get(x[1])-get(x[2]))]` and add the new columns to the original data.table by reference. – Roland Jun 04 '13 at 18:21