I am currently working on a project and I need some help. I want to predict the length of flight delays, using a statistical model. The data set does not contain the length of flight delays, but it can be calculated from the actual and scheduled departure times, I know that actual departure times - scheduled departure time will give me the flight delay which is the dependent variable. I am struggling to get the explanatory (independent) variables in a useful form to do regression analysis - the main problem is the time format of the first two columns when you read in the table from the csv file. I have attach the data file to the question because I wasn't too sure how to attach my file, I'm new to this coding thing hehe. Any help will be appreciated. xx
https://drive.google.com/file/d/11BXmJCB5UGEIRmVkM-yxPb_dHeD2CgXa/view?usp=sharing
EDIT:
Firstly Thank you for all the help
Okay I'm going to try and ask more precise questions on this topic:
So after importing the file using:
1)
Delays <- read.table("FlightDelaysSM.csv",header =T,sep=",")
2)The main issue I am having is getting the columns schedule time and deptime into a format where I can do arithmetic calculation
3)I tried the below
Delays[,1] - Delays[,2]
where the obvious issue arises for example 800 (8am) - 756 (7.56am) = 44 not 4 minutes
4)Using the help from @kerry Jackson (thank you, you're amazing x) I tried
DepartureTime <- strptime(formatC(Delays$deptime, width = 4, format = "d", flag = "0", %H%M)
ScheduleTime <- strptime(formatC(Delays$schedtime, width = 4, format = "d", flag = "0", %H%M)
DelayTime = DepartureTime - ScheduleTime
The values are also given are in seconds, I want the difference to be in minutes how would I go about doing this?
5) I then did the following:
DelayData <- data.frame(ScheduleTime, DepartureTime, DelayTime, Delays[, 4:7])
What I attain after making the DelayData
as you can see by the image I have the seconds units in my column called DelayTime which I don't want as stated in 4), and the date is in the columns ScheduleTime and DepartureTime could I possibly get some suggestions on how to correct this?