0

I am currently working with the nycflights13 dataset, specifically focusing on the flights and plans data. Within my analysis, I am using a dataframe called "df_routes", which contains the variables origin, dest, and flights (n()). This data frame is comprised of the routes for the top 50 plans with the most flights, totaling 68 routes.

My current goal is to filter the routes in the flights dataframe, in order to evaluate delays grouped by the same routes. To do so, I am using the following code:

df_delay <- df_flights %>%
   filter(origin %in% df_routes$origin & dest %in% df_routes$dest)

However, the filter function is not performing as desired. Specifically, I would like the filter function to check if the origin and destination are the same for the row currently being analysed. This will ensure that only flights with the same route are being evaluated.

titiBFG
  • 1
  • 1
  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It sounds like you should be doing a join rather than a filter. – MrFlick Apr 26 '23 at 19:42
  • 3
    You almost never want to use `==` on things of different lengths because order matters for `==`, run `c(1, 2, 3, 4) == c(2, 3)`, for example. In a case like this, I think an `inner_join` makes more sense, `df_delay = inner_join(df_flights, df_routes, by = c("origin", "dest"))`. (If `df_routes` has more columns, deselect them for the purposes of this join.) – Gregor Thomas Apr 26 '23 at 19:50

0 Answers0