3

I would like to know a tidyverse way to add summary statistics back to each row of a dataframe.

The code below works, but should be a quicker way out there, right?

library("tidyverse")
data <- (iris)

means <- iris %>%
  group_by(Species) %>%
  summarise(
    Sepal.Length = mean(Sepal.Length),
    Sepal.Width = mean(Sepal.Width)
  )

data <- merge(data, means, by = "Species")
smci
  • 32,567
  • 20
  • 113
  • 146
Nils
  • 65
  • 4
  • Just use `mutate` instead of `summarize` to get the exact same result in one step. Summarize drops all other columns and keeps only 1 summary value per group, mutate calculates the same values and then adds them on as a new column – divibisan Apr 12 '19 at 22:30
  • @camille: No it's not a duplicate. That question only wants a dataframe with 4 (summary) rows: 3 group-statistics and 1 total statistic. This question wants to **join those statistics columns back to each row of the original dataframe.** – smci Apr 12 '19 at 22:39
  • @smci got it, so the dupe divibisan posted? – camille Apr 12 '19 at 22:43
  • @divibisan: No, that question's different again: that's about joining per-group statistics (revenue sum) back to the original dataframe, then **computing summary statistics on the summary statistics** (what fraction of total revenue each daily_revenue subtotal is). Easy on the dupe trigger-fingers, everyone! – smci Apr 12 '19 at 23:15

2 Answers2

2

One way to do this would be to use mutate.

library("tidyverse")
data <- (iris)

data<-data %>% 
  group_by(Species) %>% 
  mutate(Sepal.Length.y=mean(Sepal.Length), Sepal.Width.y=mean(Sepal.Width)) 



So this is very similar to what you had before but cuts out a few steps. If you want to rearrange the order of the columns you can reorder them. Also, I would recommend changing the column names from Sepal.Length and Sepal.Width in your post but if you don't specify a unique name r will just put a .y on them to make them unique. Hope this helps.

Kyle Marsh
  • 66
  • 4
1

You can do this with dplyr::mutate_at:

iris %>% group_by(Species) %>% 
  mutate_at(.vars = vars(Sepal.Length,Sepal.Width), 
    .funs = list(mean = ~mean))

We need the list(mean = ~mean) bit, instead of just .funs = mean to rename the columns, instead of writing over the original ones.

# A tibble: 150 x 7
# Groups:   Species [3]
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_mean Sepal.Width_mean
          <dbl>       <dbl>        <dbl>       <dbl> <fct>               <dbl>            <dbl>
 1          5.1         3.5          1.4         0.2 setosa               5.01             3.43
 2          4.9         3            1.4         0.2 setosa               5.01             3.43
 3          4.7         3.2          1.3         0.2 setosa               5.01             3.43
 4          4.6         3.1          1.5         0.2 setosa               5.01             3.43
 5          5           3.6          1.4         0.2 setosa               5.01             3.43
 6          5.4         3.9          1.7         0.4 setosa               5.01             3.43
 7          4.6         3.4          1.4         0.3 setosa               5.01             3.43
 8          5           3.4          1.5         0.2 setosa               5.01             3.43
 9          4.4         2.9          1.4         0.2 setosa               5.01             3.43
10          4.9         3.1          1.5         0.1 setosa               5.01             3.43
Mako212
  • 6,787
  • 1
  • 18
  • 37