1

I am using the data.table package to speed up some summary statistic collection on a data set.

I'm curious if there's a way to group by more than one column. My data looks like this:

Date                      Value  
2016-12-11                 36
2016-12-11                 40
2016-12-12                 17
2016-12-12                 41
2016-12-12                 27
...
2017-2-21                  22
2017-2-21                  53
2017-2-21                  19
2017-2-21                  20
2017-2-21                  32

Can I get the data like this:

Date                              Value
2016-12-11                      c(36, 40)
2016-12-12                      c(17, 27, 41)
2016-2-21                       c(19, 20, 22, 32, 53)

Attention:

Each date row number is not equal. That make me go crazy.

neilfws
  • 32,751
  • 5
  • 50
  • 63
lojunren
  • 139
  • 1
  • 9
  • I don't really see a lot of benefit for this kind of storage. It's certainly possible, but why? – thelatemail Mar 14 '17 at 03:28
  • Sepcial requirement. It is just a intermediate result. The final result is not like that. Thank You. – lojunren Mar 14 '17 at 03:52
  • @thelatemail - it's also being used for [`simple features`](https://github.com/edzer/sfr) (the 'new' format in R for spatial data) – SymbolixAU Mar 14 '17 at 05:28

1 Answers1

3

We can do a group by operation to either create a string concatenation

library(data.table)
setDT(df1)[, .(Value = toString(Value)), by = Date]

or create the 'Value' column as a list

setDT(df1)[,  list(Value = list(Value)), by = Date]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 2
    I don't think there's a specific need for `setDT`, as the OP specifies they are already using `data.table` – SymbolixAU Mar 14 '17 at 03:44
  • 1
    @SymbolixAU I don't want to fight with you regarding `setDT`. It is for other users that don't know how a data.frame is converted to data.table – akrun Mar 14 '17 at 03:45