1
DT <- data.table(id = rep(1:3, 2),
                    class = rep(letters[1:6]),
                    des = rep(LETTERS[1:2], 3))

which looks like this:

   id class des
1:  1     a   A
2:  2     b   B
3:  3     c   A
4:  1     d   B
5:  2     e   A
6:  3     f   B

The question is that I need to stack different value (string type) of variables class & des in each id into one row, that is, how to convert that data.table into the following shape:

   id    class      des
1:  1     a, d     A, B
2:  2     b, e     B, A
3:  3     c, f     A, B

I have tried something like this, but the result is not what I expected.

DT %>% 
  dcast(id ~ ..., fun = function(x) paste(x, ", "), value.var = c("class", "des"))

   id   class    des
1:  1    d ,    B , 
2:  2    e ,    A , 
3:  3    f ,    B , 
Caleb
  • 41
  • 6
  • depending on what you want to do afterwards you can also try this `library(tidyverse); DT %>% tidyr::nest(-id)` and use `purrr:map` functionality afterwards. – Roman Apr 05 '18 at 11:51

3 Answers3

2

You don't really need to use dcast(). It is much simpler to group your data.table by id and then go through the columns using lapply() and use paste() with collapse = ", " to summarise:

DT[, lapply(.SD, paste, collapse = ", "), by = id]

The result looks like this:

   id class  des
1:  1  a, d A, B
2:  2  b, e B, A
3:  3  c, f A, B

You will find that this solution is much faster than using dcast():

library(microbenchmark)

microbenchmark(dcast = dcast(DT, id ~ ..., 
                            fun = function(x) paste(x, collapse = ", "), 
                            value.var = c("class", "des")),
               group = DT[, lapply(.SD, paste, collapse = ", "), by = id],
               times = 100)

Unit: microseconds
  expr      min        lq      mean    median       uq      max neval
 dcast 2460.732 2639.4095 3118.5706 2815.3385 3221.251 6942.144   100
 group  305.014  329.2315  374.9927  347.6135  377.440  670.746   100
clemens
  • 6,653
  • 2
  • 19
  • 31
1

If you will accept a dplyr solution, the following can be a solution.

DT %>%
  group_by(id) %>%
  summarise_at(vars(class, des), paste, collapse = ", ")
hpesoj626
  • 3,529
  • 1
  • 17
  • 25
1

Collapse is the important part – use paste(x, collapse = ", ") to perfrom the string aggregation:

library(data.table)
library(magrittr)

DT %>% 
   dcast(id ~ ..., 
         fun = function(x) paste(x, collapse = ", "), 
         value.var = c("class", "des"))
ozacha
  • 1,212
  • 9
  • 14
  • Thank you. I just notice the argument of [collapse] is different from the default one. [:embarrassed] – Caleb Apr 05 '18 at 11:59