0

I am trying to get better in using pipes %>% in dplyr package. I understand that the whole point of using pipes (%>%) is that it replaces the first argument in a function by the one connected by pipe. That is, in this example:

area = rep(c(3:7), 5) + rnorm(5)

Pipes

area %>% 
  mean

equal normal function

`mean(area)`.

My problem is when it gets to a dataframe. I would like to split dataframe in a list of dataframes, and than calculate means per area columns. But, I can't figure out how to call the column instead of the dataframe?

I know that I can get means by year simply by aggregate(area~ year, df, mean) but I would like to practice pipes instead.

Thank you!


# Dummy data
set.seed(13)
df<-data.frame(year = rep(c(1:5), each = 5),
               area = rep(c(3:7), each = 5) + rnorm(1))

# Calculate means. 
# Neither `mean(df$area)`, `mean("area")` or `mean[area]` does not work. How to call the column correctly?

df %>% 
  split(df$year) %>%
  mean
maycca
  • 3,848
  • 5
  • 36
  • 67

2 Answers2

5

This?

 df %>% 
      group_by(year) %>% 
      summarise(Mean=mean(area))
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
3

We need to extract the column from the list of data.frames in split. One option is to loop through the list with map, and summarise the 'area'.

df %>% 
   split(.$year) %>% 
   map_df(~ .x %>% 
             summarise(area = mean(area)))
akrun
  • 874,273
  • 37
  • 540
  • 662