0

Using the nycflights13 dataset, I want to find, using R, which flight was the latest in each month; in other words, the flight with the largest departure delay in each month.

The code I've used:

flights %>% group_by(flights$month) %>% summarize(largest_delay = max(flights$dep_delay, na.rm=TRUE))

this gives me a table of the months along with the the largest departure delay across the entire dataset, rather than the month-wise maximum:

> flights %>% group_by(flights$month) %>% summarize(largest_delay = max(flights$dep_delay, na.rm=TRUE))
# A tibble: 12 x 2
   flights$month` largest_delay
             <int>         <dbl>
 1               1          1301
 2               2          1301
 3               3          1301
 4               4          1301
 5               5          1301
 6               6          1301
 7               7          1301
 8               8          1301
 9               9          1301
10              10          1301
11              11          1301
12              12          1301

My question: how would I modify the above code such that it gives me the month-wise max? Also, how can I add in an additional column that contains the tailnum corresponding to that flight?

Numi
  • 65
  • 6

2 Answers2

1

We can use the slice function to do this:

library(nycflights13)
library(dplyr)

flights %>%
    group_by(year, month) %>%
    slice(which.max(dep_delay))

If you're looking for a base R solution, we can use lapply, split, and which:

do.call('rbind', 
       lapply(split(flights, list(flights$year, flights$month)), 
              FUN = function(d) d[which.max(d$dep_delay),]))
bouncyball
  • 10,631
  • 19
  • 31
0

The problem is your syntax - you should not use flights$ inside a dplyr pipeline - you should just use the variable names. All you need is

flights %>% group_by(month) %>% 
    summarize(largest_delay = max(dep_delay, na.rm=TRUE),
              delay_tail_num = tailnum[which.max(dep_delay)]) #add tail_num of most delayed

# A tibble: 12 x 3
   month largest_delay delay_tail_num
   <int>         <dbl> <chr>         
 1     1          1301 N384HA        
 2     2           853 N203FR        
 3     3           911 N927DA        
 4     4           960 N959DL        
 5     5           878 N523MQ        
 6     6          1137 N504MQ        
 7     7          1005 N665MQ        
 8     8           520 N758EV        
 9     9          1014 N338AA        
10    10           702 N943DL        
11    11           798 N990AT        
12    12           896 N5DMAA  
Andrew Gustar
  • 17,295
  • 1
  • 22
  • 32