I have a large CSV file which I decided to import into R and use for some data analysis. Bascially it is file with flight delays for few years and trying to create a graph to see the average delay per day of the week. I thought of the histogram but it plots graph which is not usable? Any idea please let me know. Would other graph work better? Also is there any easy way to compare on time flights to delayed flights per day of the week?
file name - airline
str(airline)
'data.frame': 7009728 obs. of 29 variables:
$ Year : int 2008 2008 2008 2008 2008 2008 2008 2008 2008 2008 ...
$ Month : int 1 1 1 1 1 1 1 1 1 1 ...
$ DayofMonth : int 3 3 3 3 3 3 3 3 3 3 ...
$ DayOfWeek : int 4 4 4 4 4 4 4 4 4 4 ...
$ DepTime : int 2003 754 628 926 1829 1940 1937 1039 617 1620 ...
$ CRSDepTime : int 1955 735 620 930 1755 1915 1830 1040 615 1620 ...
$ ArrTime : int 2211 1002 804 1054 1959 2121 2037 1132 652 1639 ...
$ CRSArrTime : int 2225 1000 750 1100 1925 2110 1940 1150 650 1655 ...
$ UniqueCarrier : Factor w/ 20 levels "9E","AA","AQ",..: 18 18 18 18 18 18 18 18 18 18 ...
$ FlightNum : int 335 3231 448 1746 3920 378 509 535 11 810 ...
$ TailNum : Factor w/ 5374 levels "","80009E","80019E",..: 3769 4129 1961 3059 2142 3852 4062 1961 3616 3324 ...
$ ActualElapsedTime: int 128 128 96 88 90 101 240 233 95 79 ...
$ CRSElapsedTime : int 150 145 90 90 90 115 250 250 95 95 ...
$ AirTime : int 116 113 76 78 77 87 230 219 70 70 ...
$ ArrDelay : int -14 2 14 -6 34 11 57 -18 2 -16 ...
$ DepDelay : int 8 19 8 -4 34 25 67 -1 2 0 ...
$ Origin : Factor w/ 303 levels "ABE","ABI","ABQ",..: 136 136 141 141 141 141 141 141 141 141 ...
$ Dest : Factor w/ 304 levels "ABE","ABI","ABQ",..: 287 287 49 49 49 151 157 157 177 177 ...
$ Distance : int 810 810 515 515 515 688 1591 1591 451 451 ...
$ TaxiIn : int 4 5 3 3 3 4 3 7 6 3 ...
$ TaxiOut : int 8 10 17 7 10 10 7 7 19 6 ...
$ Cancelled : int 0 0 0 0 0 0 0 0 0 0 ...
$ CancellationCode : Factor w/ 5 levels "","A","B","C",..: 1 1 1 1 1 1 1 1 1 1
$ Diverted : int 0 0 0 0 0 0 0 0 0 0 ...
$ CarrierDelay : int NA NA NA NA 2 NA 10 NA NA NA ...
$ WeatherDelay : int NA NA NA NA 0 NA 0 NA NA NA ...
$ NASDelay : int NA NA NA NA 0 NA 0 NA NA NA ...
$ SecurityDelay : int NA NA NA NA 0 NA 0 NA NA NA ...
$ LateAircraftDelay: int NA NA NA NA 32 NA 47 NA NA NA ...
my graph:
library(ggplot2)
ggplot(airline,aes(x = DayOfWeek, fill = factor(DepDelay))) +
geom_histogram(binwidth = 1) +
xlab ("Day of week") +
ylab ("Dep Delay") +
labs (fill = "Airline")