I'm very new to R and need some help getting it to do the analysis I want (an analysis that I've been doing in GraphPad Prism but this can't handle large datasets) please. I have a dataframe like this (simplified):
Gene Seq RBP Sites.length
1 Short A 0.26
1 Short B 0.13
1 Short C 0.65
1 Long A 0.00
1 Long B 0.39
1 Long C 1.82
2 Short A 0.13
2 Short B 0.00
2 Short C 0.00
2 Long A 0.89
...etc...
I want to know if there is a significant difference between the mean 'Sites.length' values for the 'Short' and 'Long' Seq for each RBP. I have performed an ANOVA and TukeyHSD:
> aov_migr <- aov (Sites.length ~ Seq * RBP, data = my_data)
> migr_hsd <- TukeyHSD (aov_migr)
The results look like:
> summary(aov_migr)
Df Sum Sq Mean Sq F value Pr(>F)
Seq 1 0.03 0.031 0.434 0.51
RBP 34 206.86 6.084 86.264 <2e-16 ***
Seq:RBP 34 0.84 0.025 0.351 1.00
Residuals 910 64.18 0.071
> migr_hsd
diff lwr upr p adj
Short:A-Long:A -0.012 -0.430 0.406 1
Long:B-Long:A -0.039 -.0.457 0.379 1
Short:B-Long:A -0.043 -0.460 0.375 1
Long:C-Long:A 0.556 0.138 0.974 8.909e-05
...etc...
The problem is that I need the TukeyHSD to compare only the 'Long' and 'Short' of the same RBP (i.e. only Long:A v Short:A, Long:B v Short:B etc), as the other comparisons are meaningless. Comparing them all with each other is increasing the adjusted p values as it is correcting for meaningless comparisons that I don't want it to perform.
Is there a way to tell the TukeyHSD that the data should have the 'Long' and 'Short' for each RBP grouped together/only compare the 'Long' and 'Short' of the same RBP?
Or is there another test that will do this?
Update: I have tried many other tests to see if I can replicate my analysis from Prism Graphpad in R - lme, manova, HSD.test (agricolae), glht (multcomp), factor plot, SidakSD, Sidak, repeated measures ANOVA (with Gene, Seq and RBP set as factors):
RM_aov <- aov(Sites.length ~ Seq * RBP + Error(Gene), data = mydata)
But nothing is replicating the analysis and I need to repeat this analysis with larger data sets that Graphpad Prism can't handle. The closest is the aov/TukeyHSD (except for the problem described above) and a for loop with t.test (described below; but whereas one comparison in the Prism analysis gives a significant result, all comparisons with this are not significant):
# separate long and short into separate data frames where the column headers are the RBP names and each row is the Sites.length for a Gene
result <- data.frame()
for (rbp in colnames (short)) {
test.result <- t.test (long [, rbp] - short [, rbp])
result [rbp, 'p'] <- test.result$p.value
}
result$p.adjusted <- p.adjust (result$p, method="bonferroni")
The Graphpad Prism analysis I have performed is:
Repeated measures two-way ANOVA (telling it that the values for 'long' and 'short' of a given gene are matched)
Sidak's multiple comparison test (but the significant comparison is also significant with Bonferroni and Holm-Sidak multiple comparison tests).
Can anyone help please?