0

I'm using the nycflights13::flights dataframe and want to calculate the number of flights an airplane have flown before its first more than 1 hour delay. How can I do this? I've tried with a group_by and filter, but I haven't been able to. Is there a method to count the rows till a condition (e.g. till the first dep_delay >60)?

Thanks.

OFAJ
  • 3
  • 1
  • 1
    Please show the code you've tried. While we can probably just give you the answer, it might be more informative to know "why" what you tried did not work correctly. – r2evans Apr 14 '20 at 23:35
  • OFAJ, welcome to SO! Please read how to produce a "reproducible" question. This includes sample code you've attempted (including listing non-base R packages, and any errors/warnings received), sample *unambiguous* data (e.g., `dput(head(x))` or `data.frame(x=...,y=...)`), and intended output. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. (Showing us the code you've tried so far also demonstrates that you have made effort. Since SO is not a tutorial/howto site, there is expectation that you do some research/effort before coming here.) – r2evans Apr 14 '20 at 23:43
  • Ok, thanks. You are right, I'll do next time. – OFAJ Apr 15 '20 at 19:29

2 Answers2

1
library(dplyr)
library(nycflights13)
data("flights")

There may be more elegant ways, but this code counts the total number of flights made by each plane (omitting cancelled flights) and joins this with flights that were not cancelled, grouping on the unique plane identifier (tailnum), sorting on departure date/time, assigning the row_number less 1, filtering on delays>60, and taking the first row.

select(
  filter(flights, !is.na(dep_time)) %>% 
  count(tailnum, name="flights") %>% left_join(
      filter(flights, !is.na(dep_time)) %>% 
      group_by(tailnum) %>%
      arrange(month, day, dep_time) %>%
      mutate(not_delayed=row_number() -1) %>%
      filter(dep_delay>60) %>% slice(1)), 
  tailnum, flights, not_delayed)

# A tibble: 4,037 x 3
   tailnum flights not_delayed
   <chr>     <int>       <dbl>
 1 D942DN        4           0
 2 N0EGMQ      354          53
 3 N10156      146           9
 4 N102UW       48          25
 5 N103US       46          NA
 6 N104UW       47           3
 7 N10575      272           0
 8 N105UW       45          22
 9 N107US       41          20
10 N108UW       60          36
# ... with 4,027 more rows

The plane with tailnum N103US has made 46 flights, of which none have been delayed by more than 1 hour. So the number of flights it has made before its first 1 hour delay is undefined or NA.

Edward
  • 10,360
  • 2
  • 11
  • 26
  • Thanks for the answer. As I'm looking for the number of flights a plane makes before its more than one hour delay, I tried this script based on yours: flights %>% filter(!is.na(dep_time)) %>% #select(carrier, month, day, dep_time, dep_delay) %>% group_by(tailnum) %>% arrange(month, day, dep_time) %>% mutate(First_delay = row_number()) %>% filter(dep_delay<=60) %>% summarise(n()) Unfortunately, it gave me the number of flights with less than hour delays. – OFAJ Apr 15 '20 at 20:25
  • OFAJ, *"less than hour delays"*, could that be because you used `<=60` instead of `>60`? – r2evans Apr 15 '20 at 20:52
0

I got the answer:

flights %>%
#Eliminate the NAs
filter(!is.na(dep_time)) %>% 
#Sort by date and time
arrange(time_hour) %>% 
group_by(tailnum) %>%
#cumulative number of flights delayed more than one hour
mutate(acum_delay = cumsum(dep_delay > 60)) %>% 
#count the number of flights                                         
summarise(before_1hdelay = sum(acum_delay < 1))
OFAJ
  • 3
  • 1
  • Nice and elegant. My revised answer gives the same as yours except for three planes. The difference is because I sort on month+day+dep_time and you sort on time_hour. For example, N322NB on Jan 13 had two flights scheduled. The one scheduled to depart at 8am actually departed at 6pm! But at 11am the same plane, scheduled to depart at 11:14am, actually departed on time, so it should be included in your answer, which is 6 flights, not 5. – Edward Apr 16 '20 at 00:58
  • Thanks for the answer, nice solution! – OFAJ Apr 16 '20 at 22:46