The code below works as expected. Executing up to and including the line head(1)
, I find that JFK to LAX is the route with the most flights. Then, I use inner_join
to filter the flights
table to include only flights on this route. This gives me 11,252 rows.
library(nycflights13)
library(dplyr)
flights %>%
group_by(origin, dest) %>%
summarize(num_flights=n()) %>%
arrange(-num_flights) %>%
head(1) %>% # JFK to LAX has the most flights
select(origin, dest) %>%
inner_join(flights, by=c("origin", "dest"))
How can I instead use semi_join
to achieve the same goal? I want to have a single line of code as above rather than using a temp variable. However, if I would write it with a temp variable, it would look like this. It gives the same result:
filterList <- flights %>%
group_by(origin, dest) %>%
summarize(num_flights=n()) %>%
arrange(-num_flights) %>%
head(1) %>%
select(origin, dest)
semi_join(flights, filterList, by=c("origin", "dest") )
I'd like to keep the logic similar such that first I determine the filter and then apply it. I think I would be interested in a right_semi_join
function, but that does not exist.