library(dplyr)
library(nycflights13)
data("flights")
There may be more elegant ways, but this code counts the total number of flights made by each plane (omitting cancelled flights) and joins this with flights that were not cancelled, grouping on the unique plane identifier (tailnum
), sorting on departure date/time, assigning the row_number less 1, filtering on delays>60, and taking the first row.
select(
filter(flights, !is.na(dep_time)) %>%
count(tailnum, name="flights") %>% left_join(
filter(flights, !is.na(dep_time)) %>%
group_by(tailnum) %>%
arrange(month, day, dep_time) %>%
mutate(not_delayed=row_number() -1) %>%
filter(dep_delay>60) %>% slice(1)),
tailnum, flights, not_delayed)
# A tibble: 4,037 x 3
tailnum flights not_delayed
<chr> <int> <dbl>
1 D942DN 4 0
2 N0EGMQ 354 53
3 N10156 146 9
4 N102UW 48 25
5 N103US 46 NA
6 N104UW 47 3
7 N10575 272 0
8 N105UW 45 22
9 N107US 41 20
10 N108UW 60 36
# ... with 4,027 more rows
The plane with tailnum N103US has made 46 flights, of which none have been delayed by more than 1 hour. So the number of flights it has made before its first 1 hour delay is undefined or NA.