3

I understand that by design, tidyr does less than reshape2: tidyr never aggregates.

Is there a "right" way to replicate reshape2's aggregation, in the sense of better following the tidyverse philosophy?

I usually combine a few dplyr verbs and then one from tidyr. I.e.:

To replicate

dcast(mtcars, gear~cyl, value.var = "disp", sum)

  gear     4     6      8
1    3 120.1 483.0 4291.4
2    4 821.0 655.2    0.0
3    5 215.4 145.0  652.0

One can do

mtcars %>% 
    group_by(gear, cyl) %>% 
    summarise(disp = sum(disp)) %>% 
    spread(cyl, disp)

Source: local data frame [3 x 4]
Groups: gear [3]

   gear   `4`   `6`    `8`
* <dbl> <dbl> <dbl>  <dbl>
1     3 120.1 483.0 4291.4
2     4 821.0 655.2     NA
3     5 215.4 145.0  652.0

I'll appreciate any insight on whether this is an optimal solution, and if it's not, what would be better and why

HAVB
  • 1,858
  • 1
  • 22
  • 37
  • I'll hazard a guess it's that it works better with the forward-pipe strategy of one step at a time – Scransom Jun 20 '17 at 01:36
  • 5
    You could shorten it to `mtcars %>% count(cyl, gear, wt = disp) %>% spread(cyl, n)`, but that's pretty much it. There's not really a question here. – alistaire Jun 20 '17 at 01:39
  • Many ways to skin a cat. It's not objective better whether you do it with `reshape2`, `tidyverse` or even in base. – Adam Quek Jun 20 '17 at 02:24
  • Agreed that there are many ways to skin our cat (or our data), but when tidyr lost its aggregation function, what was supposed to replace it within-tidyverse? – HAVB Jun 20 '17 at 02:27
  • Similar to https://stackoverflow.com/questions/35225052/spread-vs-dcast – Sam Firke Jun 20 '17 at 02:38
  • 1
    @HAVB I think the way you've done it is fine. I think the "tidyverse way" is to work with the data in long form most of the time, using dplyr's aggregation tools, and then spread at the end for presentation. But this is probably too broad to get good answers. – Marius Jun 20 '17 at 03:24

0 Answers0