0

If I wanted to apply multiple functions in a dplyr call, I could run the following:

mtcars %>% group_by(gear) %>% select(hp, disp) %>%  
  summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T)))

Output:

# A tibble: 3 x 5
   gear  hp_n disp_n  hp_mean disp_mean
  <dbl> <int>  <int>    <dbl>     <dbl>
1     3    15     15 176.1333  326.3000
2     4    12     12  89.5000  123.0167
3     5     5      5 195.6000  202.4800

This is great and exactly what I want. Now if I wanted to melt this to make it into a pivot table (we use a lot of pivot tables at my job), I could just call melt. However the result would make both the _mean and the _n columns into their own rows. So if I run the following:

mtcars %>% group_by(gear) %>% select(hp, disp) %>%  
  summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>%
  melt(id.vars="gear")

I will get

   gear  variable    value
1     3      hp_n  15.0000
2     4      hp_n  12.0000
3     5      hp_n   5.0000
4     3    disp_n  15.0000
5     4    disp_n  12.0000
6     5    disp_n   5.0000
7     3   hp_mean 176.1333
8     4   hp_mean  89.5000
9     5   hp_mean 195.6000
10    3 disp_mean 326.3000
11    4 disp_mean 123.0167
12    5 disp_mean 202.4800

When what I really want is:

       gear  variable  metric  value
    1     3      hp     n       15.0000
    2     3      hp     mean    176.1333
    3     5      disp   n       15.0000
    4     5      disp   mean    326.3000
....

How do I go about doing this with dplyr? Do I need to re-work the melt command?

vashts85
  • 1,069
  • 3
  • 14
  • 28
  • 1
    So you basically you want to re-order? , i.e. `mtcars %>% group_by(gear) %>% select(hp, disp) %>% summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>% gather(variable, value, -gear) %>% arrange(gear, sub('.*_', '', variable))`? – Sotos Oct 16 '17 at 14:50
  • From your desired output, it seems to me that you're looking to grab the `max` of each unique `variable`. If that's the case. then group by variable and use `summarize` with `max`. – Abdou Oct 16 '17 at 14:50
  • 1
    Or maybe this `mtcars %>% group_by(gear) %>% select(hp, disp) %>% summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>% gather(variable, value, -gear) %>% arrange(gear, sub('_.*', '', variable), sub('.*_', '', variable))` – Sotos Oct 16 '17 at 14:54
  • @Sotos this is really close but seeing it I realize I mis-specified the output. I've corrected it to show what I really want -- essentially something I can dump in Excel to run pivots. – vashts85 Oct 16 '17 at 14:54
  • 1
    Also note that `melt` is not `tidyr`... It's `reshape2` – Sotos Oct 16 '17 at 14:56
  • 1
    Oh, that's easier. Just use `separate`, i.e. `mtcars %>% group_by(gear) %>% select(hp, disp) %>% + summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>% gather(variable, value, -gear) %>% arrange(gear, sub('_.*', '', variable), sub('.*_', '', variable)) %>% separate(variable, into = c('var', 'metric'), '_')` – Sotos Oct 16 '17 at 14:57
  • 1
    Which is a dupe :) – Sotos Oct 16 '17 at 14:58
  • Your solution works (when I remove an extra `+` sign). But, can you walk me through the `gather`, `arrange`, and `separate` steps? Why all the `subs` when I don't notice anything changing? – vashts85 Oct 16 '17 at 15:00
  • That is poorly written actually because I just added the `separate` part at the end. (`sub`s where for sorting), but the right way would be more clear to you: `mtcars %>% group_by(gear) %>% select(hp, disp) %>% summarise_all(funs(n=sum(!is.na(.)), mean=mean(.,na.rm=T))) %>% gather(variable, value, -gear) %>% separate(variable, into = c('var', 'metric'), '_') %>% arrange(gear, var, metric)` – Sotos Oct 16 '17 at 18:44

0 Answers0