0

The code below works as expected. Executing up to and including the line head(1), I find that JFK to LAX is the route with the most flights. Then, I use inner_join to filter the flights table to include only flights on this route. This gives me 11,252 rows.

library(nycflights13)
library(dplyr)

flights %>% 
  group_by(origin, dest) %>% 
  summarize(num_flights=n()) %>% 
  arrange(-num_flights) %>% 
  head(1) %>% # JFK to LAX has the most flights
  select(origin, dest) %>% 
  inner_join(flights, by=c("origin", "dest"))

How can I instead use semi_join to achieve the same goal? I want to have a single line of code as above rather than using a temp variable. However, if I would write it with a temp variable, it would look like this. It gives the same result:

  filterList <- flights %>% 
  group_by(origin, dest) %>% 
  summarize(num_flights=n()) %>% 
  arrange(-num_flights) %>% 
  head(1) %>% 
  select(origin, dest)

  semi_join(flights, filterList, by=c("origin", "dest") )

I'd like to keep the logic similar such that first I determine the filter and then apply it. I think I would be interested in a right_semi_join function, but that does not exist.

Bobby
  • 1,585
  • 3
  • 19
  • 42

2 Answers2

2

Use the . to put the chain data in to the second parameter rather than the first.

flights %>% 
  group_by(origin, dest) %>% 
  summarize(num_flights=n()) %>% 
  arrange(-num_flights) %>% 
  head(1) %>% # JFK to LAX has the most flights
  select(origin, dest) %>% 
  semi_join(flights, ., by=c("origin", "dest"))
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks very much! Now that I know this exists, it will be much easier to read about possible uses for it! https://stackoverflow.com/questions/35272457/what-does-the-dplyr-period-character-reference – Bobby Oct 09 '17 at 22:06
2

Selecting the route with the most flights without using join

library(nycflights13)
library(dplyr)

df2 <- flights %>% 
  add_count(origin, dest) %>%
  top_n(1)

df2$n <- NULL

> setequal(df1, df2) # assuming original data.frame is stored in df1
TRUE
manotheshark
  • 4,297
  • 17
  • 30