1

Sorry for a noob question, but I'm new to R and I need help explaining this.

I see instruction that: x %>% f(y) -> f(x , y)

This "Then" Symbol %>%, I don't get it. Can someone give me a dummy guide for this? I'm having a really hard time understanding this. I do understand it's something like this: x <- 3, y <- 3x (Or I could be wrong here too). Can anyone help me out here and explain it to me in a really-really simple way? Thanks in advance!

P.S. Here was the example used, and I've no idea what it is

library(dplyr)
hourly_delay <- flights %>%

filter(!is.na(dep_delay)) %>% 

group_by(date, hour) %>%   

summarise(delay = mean(dep_delay), n = n()) %>%  

filter(n > 10)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
user3784616
  • 1,939
  • 3
  • 15
  • 10

1 Answers1

9

Would you rather write:

library(magrittr)
sum(multiply_by(add(1:10, 10), 2))

or

1:10 %>% add(10) %>% multiply_by(2) %>% sum()

?

The intent is made a lot more clear in the second example, and it's fundamentally the same. It's usually easiest to think of the first expression (1:10) defining some data object you're working on, then %>% applying a set of operations sequentially on that data set. Reading it out loud,

Take the data 1:10, then add 10, then multiply by 2, then sum

the way we describe the operation in English is almost identical to how we write it with the pipe operator, which is what makes it such a nice tool.

With your example:

library(dplyr)
flights %>%
    filter(!is.na(dep_delay)) %>%
    group_by(date, hour) %>%
    summarise(delay = mean(dep_delay), n = n()) %>%
    filter(n > 10)

we are saying

Take the data set flights, then
    Filter it so we only keep rows where `dep_delay` is not NA, then
    Group the data frame by the `date`, `hour` columns, then
    Summarize the data set (across grouping variables), with `delay`
      as the mean of `dep_delay`, and `n` as the number of elements
      in that 'group' (`n()`), then
    Filter it so we only keep rows where there were more than 10 
      elements per group.
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Kevin Ushey
  • 20,530
  • 5
  • 56
  • 88