6

I'm trying to aggregate a data from a data.table to create a new column which is a list of previous rows. It's easier to see by example:

dt <- data.table(id = c(1,1,1,1,2,2,3,3,3), letter = c('a','a','b','c','a','c','b','b','a'))

I would like to aggregate this in such a ways that the result should be

   id  letter
1:  1 a,a,b,c
2:  2     a,c
3:  3   b,b,a  

Intuitively I tried

dt[,j = list(list(letter)), by = id]

but that doesn't work. Oddly enough when I go case by case, for example:

> dt[id == 1,j = list(list(letter)), by = id]

   id      V1
1:  1 a,a,b,c

the result is fine... I feel like I'm missing an .SD somewhere or something like that...

Can anybody point me in the right direction?

Thanks!

MagicScout
  • 105
  • 4

2 Answers2

8

Update: The behaviour DT[, list(list(.)), by=.] sometimes resulted in wrong results in R version >= 3.1.0. This is now fixed in commit #1280 in the current development version of data.table v1.9.3. From NEWS:

  • DT[, list(list(.)), by=.] returns correct results in R >=3.1.0 as well. The bug was due to recent (welcoming) changes in R v3.1.0 where list(.) does not result in a copy. Closes #481.

With this update, it's not necessary for I() anymore. You can just do: DT[, list(list(.)), by=.] as before.


This seems to be a similar issue as the known bug #5585. In your case, I think you could just use

dt[, paste(letter, collapse=","), by = id] 

to fix your problem.

As @ilir pointed out, if it is actually desirable to get a list (rather than the displayed character), you could use the workaround suggested in the bug report:

dt[, list(list(I(letter))), by = id]
Arun
  • 116,683
  • 26
  • 284
  • 387
shadow
  • 21,823
  • 4
  • 63
  • 77
  • upvoted, beat me to it..although this doesn't return the variable name "letter" – Ben Rollert Apr 25 '14 at 09:10
  • This issue is because `list(.)` shallow copies in R3.1.0. Will be fixed for next release. – Arun Apr 25 '14 at 09:44
  • Good to know the reason. Thanks Arun. I found another way to do it using `.SD` but this works just the same. I didn't benchmark the runtime between to see which is better. – MagicScout Apr 25 '14 at 14:04
1

The syntax below works for me:

dt[, list(lst=list(letter)), by=id]

I am using R version 3.0.3, data.table_1.9.2.

ilir
  • 3,236
  • 15
  • 23
  • Does it really give the desired result? For me, it just gives `b,b,a,c` for all id's. Using R 3.1.0, data.table_1.9.2 – shadow Apr 25 '14 at 09:10
  • Works as intended on my version. It must be the bug @shadow is pointing out. The bug report has a workaround as well. – ilir Apr 25 '14 at 09:17