1

I have a tibble of flights for a given airline schedule that I am trying to 'link' together. The data follows the IATA SSIM format, or more explicitly contains the current flight 'key' and the following flight 'key'. I am trying to 'link' these flights to determine the number of aircraft in that schedule. For example, if a schedule looks like:

flight123 -> flight456 -> flight789 -> end
flight987 -> flight654 -> flight321 -> end

This would require two (2) aircraft to fly. I have been able to accomplish this using which(), match(), or filter(), but I am having issues with the speed. My tibble has over 200,000 rows so this is taking more than 8 minutes to return. I would like this to return in less than a minute, if possible. Example below:

library(tidyverse)
dat %>% mutate(nextIndex = purrr::map(nextKey, function(id){match(x = id, table = dat$key)}))

Could group_by() or nest() help to improve speed? Using a for() loop took an absurd amount of time...

Here is a slice of the data... unfortunately, the IATA industry standard format does not include much data as the format is ancient. The key is defined as {FlightNumber}/{DepartureDate}{Origin}. Tail numbers are not assigned and therefore not available.

# A tibble: 10 x 2
   key             nextKey        
   <glue>          <glue>         
 1 3845/13Apr19GGG 3876/13Apr19DFW
 2 246/29Apr19CLT  123/29Apr19PBI  
 3 2561/24Apr19PHX 2604/24Apr19BOS
 4 2101/01Apr19DCA 1660/01Apr19DFW
 5 3443/21Apr19BTR 3703/21Apr19DFW
 6 2772/07Apr19JFK 1810/07Apr19AUS
 7 784/21Apr19BWI  NA 
 8 5199/25Apr19PHL 5090/25Apr19HVN
 9 375/14Apr19JAX  2360/14Apr19DFW
10 5517/30Apr19YYZ 5301/30Apr19DCA

Ideally, I would like the final result to be grouped or nested by each individual line of flying (aircraft).

Kyle Power
  • 31
  • 5
  • 2
    can you output a sample of your data? My guess is pivoting the data is what needs to happen. Is each row a flight? Is the key the tail # or is there a column that has the next flt # + Origin. – Ryan John Jan 21 '21 at 17:18
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Jan 21 '21 at 17:31
  • @RyanJohn I have updated the post with sample data. Tail numbers are not assigned, therefore not available. Current origin and next origin city are available. – Kyle Power Jan 21 '21 at 17:39
  • How do you know where the flight ends? Row 1 goes from GGG -> DFW -> ? end? – Ryan John Jan 21 '21 at 18:01
  • @RyanJohn if the `nextKey` field is `NA` then there is no following flight – Kyle Power Jan 21 '21 at 20:33
  • 1
    Sounds like some form of pivotting might help out a lot, but I'm having a hard time undestanding how the data's structured. – Pedro Cavalcante Jan 21 '21 at 20:52
  • I think we need the data a step before the glue step. I dont believe you've described how one line of flying ends. The flight leaving DFW could land after midnight in CLT. – Ryan John Jan 21 '21 at 22:29

0 Answers0