use dplyr's summarise_each to return one row per function?

Question

I'm using dplyr's summarise_each to apply a function to multiple columns of data. One thing that's nice is that you can apply multiple functions at once. Thing is, it's annoying that the output is a dataframe with a single row. It seems like it should return as many rows as functions, with as many columns as columns that were summarised.

library(dplyr)  
default <- 
  iris %>% 
  summarise_each(funs(min, max), matches("Petal"))

this returns

> default
  Petal.Length_min Petal.Width_min Petal.Length_max Petal.Width_max
1                1             0.1              6.9             2.5

I'd prefer something like

library(reshape2)
desired <- 
  iris %>% 
  select(matches("Petal")) %>% 
  melt() %>% 
  group_by(variable) %>% 
  summarize(min=min(value),max=max(value)) %>%
  t()

which returns something close (not a dataframe, but you all get the idea)

> desired
         [,1]           [,2]         
variable "Petal.Length" "Petal.Width"
min      "1.0"          "0.1"        
max      "6.9"          "2.5"

is there an option in summarise_each to do this? If not, Hadley, would you mind adding it?

score 24 · Accepted Answer · answered Jan 10 '15 at 19:36

24

You can achieve a similar output combining the dplyr and tidyr packages. Something along these lines can help

library(dplyr)
library(tidyr)

iris %>%
  select(matches("Petal")) %>%
  summarise_each(funs(min, max)) %>%
  gather(variable, value) %>%
  separate(variable, c("var", "stat"), sep = "\\_") %>%
  spread(var, value)
##   stat Petal.Length Petal.Width
## 1  max          6.9         2.5
## 2  min          1.0         0.1

answered Jan 10 '15 at 19:36

dickoa

18,217
3
36
50

2

cool, and a little bit shorter (with default values) `gather %>% separate(key, c("key","stat"), sep = "_") %>% spread(key, value)` – ckluss Jan 10 '15 at 19:55
@ckluss Nice, thanks. Feel free to edit the answer to update it. – dickoa Jan 10 '15 at 20:01
Very nice. Gives me a reason to finally dive into tidyr. Many thanks. – Alex Coppock Jan 11 '15 at 00:17
6

Thanks @dickoa. If you have column names with multiple underscores, you can use this regex: `sep = "_(?=[^_]*$)"`. It will match only the last underscore to split the columns. – Lionel Henry Jan 17 '15 at 11:58

uhlitz · Answer 2 · 2015-01-12T10:21:08.317

7

To my knowledge there's no such argument. Anyhow, here's a workaround that outputs tidy data, I think that would be even better than having as many rows as functions and as many columns as summarised columns. (note that add_rownames requires dplyr 0.4.0)

library("dplyr")
library("tidyr")

iris %>% 
  summarise_each(funs(min, max, mean, median), matches("Petal")) %>%
  t %>% 
  as.data.frame %>% 
  add_rownames %>%
  separate(rowname, into = c("feature", "fun"), sep = "_")

returns:

       feature    fun       V1
1 Petal.Length    min 1.000000
2  Petal.Width    min 0.100000
3 Petal.Length    max 6.900000
4  Petal.Width    max 2.500000
5 Petal.Length   mean 3.758000
6  Petal.Width   mean 1.199333
7 Petal.Length median 4.350000
8  Petal.Width median 1.300000

edited Jan 12 '15 at 10:21

answered Jan 10 '15 at 19:21

uhlitz

1,472
10
8

I could see this format being useful in many situations. thanks! – Alex Coppock Jan 12 '15 at 00:13
Small pedantic note: `add_rownames()` is now deprecated and the suggestion is to use `tibble::rownames_to_column()` instead. – Matteo Castagna Aug 30 '16 at 15:45

alistaire · Answer 3 · 2017-06-22T02:58:25.247

3

One option is to use purrr::map_df (really map_dfc to simplify back to a data.frame with bind_cols though map_df is fine for now) with a function that makes a vector of results of each function, i.e.

library(tidyverse)

iris %>% select(contains('Petal')) %>% 
    map_dfc(~c(min(.x), max(.x))) %>% 
    mutate(stat = c('min', 'max'))    # to add column of function names

#> # A tibble: 2 × 3
#>   Petal.Length Petal.Width  stat
#>          <dbl>       <dbl> <chr>
#> 1          1.0         0.1   min
#> 2          6.9         2.5   max

edited Jun 22 '17 at 02:58

answered Sep 20 '16 at 05:15

alistaire

42,459
4
77
117

change `dmap` -> `map_df` for newer version of `purrr`, per [tidyverse news](http://purrr.tidyverse.org/news/#purrr-and-dplyr) – Paul Jun 21 '17 at 23:01

use dplyr's summarise_each to return one row per function?

3 Answers3

Linked

Related