T tests in R- unable to run together

Question

I have an airline dataset from stat computing which I am trying to analyse.

There are variables DepTime and ArrDelay (Departure Time and Arrival Delay). I am trying to analyse how Arrival Delay is varying with certain chunks of departure time. My objective is to find which time chunks should a person avoid while booking their tickets to avoid arrival delay

My understanding-If a one tailed t test between arrival delays for dep time >1800 and arrival delays for dep time >1900 show a high significance, it means that one should avoid flights between 1800 and 1900. ( Please correct me if I am wrong). I want to run such tests for all departure hours.

**Totally new to programming and Data Science. Any help would be much appreciated.

Data looks like this. The highlighted columns are the ones I am analysing

enter image description here

So do you want to test all departure hours against each other? It may be better to test each hour vs. all hours that way you know which times are better/worse than "an average day." Why don't you post some data and what you want the output to look like so we can better help you. — emilliman5, Nov 30 '16 at 14:05
See this [SO Post](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a R reproducible example — emilliman5, Nov 30 '16 at 14:30
Sorry for the previous comment. So considering just the two columns DepTime and ArrDelay data looks like this [1829(time): 23(delay in minutes)], [1700:10], [1000: 5],[1750:137]. Your idea sounds fine too. I basically want to see which hours in a day are not so favorable to travel w.r.t delays. — Anu, Nov 30 '16 at 14:32
Please put all code and data necessary to reproduce this in the question itself — Hack-R, Nov 30 '16 at 14:45

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

0

Sharing an image of the data is not the same as providing the data for us to work with...

That said I went and grabbed one year of data and worked this up.

flights <- read.csv("~/Downloads/1995.csv", header=T)

flights <- flights[, c("DepTime", "ArrDelay")]
flights$Dep <- round(flights$DepTime-30, digits = -2)
head(flights, n=25)

# This tests each hour of departures against the entire day. 
# Alternative is set to "less" because we want to know if a given hour
# has less delay than the day as a whole.

pVsDay <- tapply(flights$ArrDelay, flights$Dep, 
                 function(x) t.test(x, flights$ArrDelay, alternative = "less"))

# This tests each hour of departures against every other hour of the day. 
# Alternative is set to "less" because we want to know if a given hour
# has less delay than the other hours.
pAllvsAll <- tapply(flights$ArrDelay, flights$Dep, 
                           function(x) tapply(flights$ArrDelay, flights$Dep, function (z) 
                             t.test(x, z, alternative = "less")))

I'll let you figure out multiple hypothesis testing and the like.

All vs All

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 30 '16 at 21:54

emilliman5

5,816
3
27
37

Thanks a lot! I am new to stackoverflow. Apologies for not posting the dataset! I understood your code.. however when I run this I am getting the following output..Am I missing something? Length Class Mode 0 9 htest list 100 9 htest list 200 9 htest list 500 9 htest list 600 9 htest list 700 9 htest list 800 9 htest list 900 9 htest list 1000 9 htest list 1100 9 htest list 1200 9 htest list 1300 9 htest list 1400 9 htest list 1500 9 htest list – Anu Nov 30 '16 at 22:37
to access the comparison of hour 900 to the entire day use `pVsDay[[10]]`, to access the comparision between 2200 and 1300 use `pAllvsAll[[23]][[14]]` – emilliman5 Nov 30 '16 at 22:51
Thanks a lott! Owe you one big time. With advice like this one doesn't get intimidated by programming. – Anu Nov 30 '16 at 23:22
so I should be able to access just the pvalues by pVsDay[[10]]$p.value right? Last question..I am struggling to plot the graph. How did you plot it? can a function be used inside qplot or ggplot? – Anu Nov 30 '16 at 23:27
Correct, ...$p.value will return just the pvalue. It would probably be easiest to extract the pvalues to a new object and then plot – emilliman5 Nov 30 '16 at 23:29
For storing the pvalues into an object, I am writing this. However, I am getting an error "Error in pVsDay[[i]] : attempt to select less than one element in get1index " Can you help me understand where I am going wrong? dayplist <- NULL for (i in seq(0,24,1)) { dayplist <- c(dayplist,pVsDay[[i]]$p.value) } – Anu Dec 01 '16 at 00:11
Indexing starts at 1 not 0. `seq(1, length(pVsDay),1)` – emilliman5 Dec 01 '16 at 00:33
Can you help me learn how you plotted the above graphs? (Sorry if I sound too ignorant) for (i in seq(1,length(pAllvsAll),1)) { allplist <- c(allplist,pAllvsAll[[i]]$p.value) } str(allplist) This works well with pVsDay. This doesnt work for pAllvsAll. It gives me NULL. :( – Anu Dec 01 '16 at 01:05
pAllvsAll is a list of lists, you need to iterate through the second list to get the p.value. Try `str(pAllvsAll)` to see what I mean – emilliman5 Dec 01 '16 at 13:58

T tests in R- unable to run together

1 Answers1

All vs All