Understanding %>% operator

Question

Sorry for a noob question, but I'm new to R and I need help explaining this.

I see instruction that: x %>% f(y) -> f(x , y)

This "Then" Symbol %>%, I don't get it. Can someone give me a dummy guide for this? I'm having a really hard time understanding this. I do understand it's something like this: x <- 3, y <- 3x (Or I could be wrong here too). Can anyone help me out here and explain it to me in a really-really simple way? Thanks in advance!

P.S. Here was the example used, and I've no idea what it is

library(dplyr)
hourly_delay <- flights %>%

filter(!is.na(dep_delay)) %>% 

group_by(date, hour) %>%   

summarise(delay = mean(dep_delay), n = n()) %>%  

filter(n > 10)

score 9 · Accepted Answer · edited Jul 19 '14 at 21:32

9

Would you rather write:

library(magrittr)
sum(multiply_by(add(1:10, 10), 2))

or

1:10 %>% add(10) %>% multiply_by(2) %>% sum()

?

The intent is made a lot more clear in the second example, and it's fundamentally the same. It's usually easiest to think of the first expression (1:10) defining some data object you're working on, then %>% applying a set of operations sequentially on that data set. Reading it out loud,

Take the data 1:10, then add 10, then multiply by 2, then sum

the way we describe the operation in English is almost identical to how we write it with the pipe operator, which is what makes it such a nice tool.

With your example:

library(dplyr)
flights %>%
    filter(!is.na(dep_delay)) %>%
    group_by(date, hour) %>%
    summarise(delay = mean(dep_delay), n = n()) %>%
    filter(n > 10)

we are saying

Take the data set flights, then
    Filter it so we only keep rows where `dep_delay` is not NA, then
    Group the data frame by the `date`, `hour` columns, then
    Summarize the data set (across grouping variables), with `delay`
      as the mean of `dep_delay`, and `n` as the number of elements
      in that 'group' (`n()`), then
    Filter it so we only keep rows where there were more than 10 
      elements per group.

edited Jul 19 '14 at 21:32

David Arenburg

91,361
17
137
196

answered Jul 19 '14 at 21:24

Kevin Ushey

20,530
5
56
88

With the sum(multiply_by(add(1:!0, 10), 2)), when I enter it it shows Error: could not find function "multiply_by". And the 2nd line shows Error in eval(expr, envir, enclos) : could not find function "add". That aside, thanks for your answer! Just wondering, so the Sequence matters here, right? – user3784616 Jul 19 '14 at 21:29
Sorry, you need `library(magrittr)` first to get those aliases (plus the pipe operator `%>%`) – Kevin Ushey Jul 19 '14 at 21:30
I guess you both should mention `dplyr` here too – David Arenburg Jul 19 '14 at 21:31
Oh okay! I'm trying to learn dplyr, and didn't realize you need magrittr. Will get it right now :) – user3784616 Jul 19 '14 at 21:31
`dplyr` imports (and re-exports) `%>%` so you don't need it per-se to use `dplyr`, but `magrittr` comes with its own set of utility functions (that you don't get by default when just loading `dplyr`) – Kevin Ushey Jul 19 '14 at 21:33
^thanks! I'll youtube magrittr now :) – user3784616 Jul 19 '14 at 21:35
@user3784616, just read [this](http://cran.r-project.org/web/packages/magrittr/magrittr.pdf) – David Arenburg Jul 19 '14 at 21:39
^thank you very much! +10 kudos points to you all :) – user3784616 Jul 19 '14 at 21:41
magrittr has a youtube channel? – rawr Jul 19 '14 at 22:47

Understanding %>% operator

1 Answers1

Linked