0

Apologies in advance for my English, it is not my first language.

I have a dataset of all the delay counts and lengths of domestic US flights. My goal is to investigate whether certain airports cause more delays than others.

I will specifically be looking at the top 20 airports by passenger count and comparing this to the whole average. I started off with:

> data <- read.csv("air.csv", header=T)
> str(data)
'data.frame':   263214 obs. of  22 variables:
 $ year               : int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
 $ X.month            : int  1 1 1 1 1 1 1 1 1 1 ...
 $ carrier            : chr  "DL" "DL" "DL" "DL" ...
 $ airport            : chr  "PBI" "PDX" "PHL" "PHX" ...
 $ arr_flights        : num  650 314 513 334 217 181 10 31 216 122 ...
 $ arr_del15          : num  126 61 97 78 47 42 3 2 42 21 ...
 $ carrier_ct         : num  21.06 14.09 27.6 20.14 8.08 ...
 $ X.weather_ct       : num  6.44 2.61 0.42 2.02 0.44 1.06 1 0 0.43 0 ...
 $ nas_ct             : num  51.6 34.2 51.9 39.4 21.9 ...
 $ security_ct        : num  1 0 0 0 0 0 0 0 0 0 ...
 $ late_aircraft_ct   : num  45.9 10.1 17.1 16.4 16.6 ...
 $ arr_cancelled      : num  4 30 15 3 4 2 1 0 3 2 ...
 $ arr_diverted       : num  0 3 0 1 1 0 0 0 0 0 ...
 $ X.arr_delay        : num  5425 2801 4261 3400 1737 ...
 $ X.carrier_delay    : num  881 478 1150 1159 350 ...
 $ weather_delay      : num  397 239 16 166 28 195 189 0 12 0 ...
 $ nas_delay          : num  2016 1365 2286 1295 522 ...
 $ security_delay     : num  15 0 0 0 0 0 0 0 0 0 ...
 $ late_aircraft_delay: num  2116 719 809 780 837 ...
> nrow(data)
[1] 263214

After which, I did the following:

> airport <- data[which(data$airport==
+                         c("ATL","BOS","CLT","DEN","DFW",
+                           "DTW","EWR","FLL","IAH","JFK",
+                           "LAS","LAX","MCO","MIA","MSP",
+                           "ORD","PHL","PHX","SEA","SFO")),]
Warning message:
In data$airport == c("ATL", "BOS", "CLT", "DEN", "DFW", "DTW", "EWR",  :
  longer object length is not a multiple of shorter object length
> nrow(airport)
[1] 2406

I played around with the data beforehand in Excel and had 46521 data-points so I am not quite sure why it has only returned 2406. Could someone please provide me with some clarity :)

Thanks in advance !!

Ahmed
  • 11
  • 1

0 Answers0