5

I'm very new to R and need some help getting it to do the analysis I want (an analysis that I've been doing in GraphPad Prism but this can't handle large datasets) please. I have a dataframe like this (simplified):

Gene Seq   RBP Sites.length
1    Short A   0.26
1    Short B   0.13
1    Short C   0.65
1    Long  A   0.00
1    Long  B   0.39
1    Long  C   1.82
2    Short A   0.13
2    Short B   0.00
2    Short C   0.00
2    Long  A   0.89
...etc...

I want to know if there is a significant difference between the mean 'Sites.length' values for the 'Short' and 'Long' Seq for each RBP. I have performed an ANOVA and TukeyHSD:

> aov_migr <- aov (Sites.length ~ Seq * RBP, data = my_data)
> migr_hsd <- TukeyHSD (aov_migr)

The results look like:

> summary(aov_migr)
              Df Sum Sq Mean Sq F value Pr(>F)
Seq           1   0.03   0.031   0.434   0.51
RBP          34 206.86   6.084  86.264 <2e-16 ***
Seq:RBP      34   0.84   0.025   0.351   1.00
Residuals   910  64.18   0.071

> migr_hsd
                diff   lwr     upr   p adj
Short:A-Long:A  -0.012 -0.430  0.406 1
Long:B-Long:A   -0.039 -.0.457 0.379 1
Short:B-Long:A  -0.043 -0.460  0.375 1
Long:C-Long:A   0.556  0.138   0.974 8.909e-05
...etc...

The problem is that I need the TukeyHSD to compare only the 'Long' and 'Short' of the same RBP (i.e. only Long:A v Short:A, Long:B v Short:B etc), as the other comparisons are meaningless. Comparing them all with each other is increasing the adjusted p values as it is correcting for meaningless comparisons that I don't want it to perform.

Is there a way to tell the TukeyHSD that the data should have the 'Long' and 'Short' for each RBP grouped together/only compare the 'Long' and 'Short' of the same RBP?

Or is there another test that will do this?

Update: I have tried many other tests to see if I can replicate my analysis from Prism Graphpad in R - lme, manova, HSD.test (agricolae), glht (multcomp), factor plot, SidakSD, Sidak, repeated measures ANOVA (with Gene, Seq and RBP set as factors):

RM_aov <- aov(Sites.length ~ Seq * RBP + Error(Gene), data = mydata)

But nothing is replicating the analysis and I need to repeat this analysis with larger data sets that Graphpad Prism can't handle. The closest is the aov/TukeyHSD (except for the problem described above) and a for loop with t.test (described below; but whereas one comparison in the Prism analysis gives a significant result, all comparisons with this are not significant):

# separate long and short into separate data frames where the column headers are the RBP names and each row is the Sites.length for a Gene
result <- data.frame()
for (rbp in colnames (short)) {
  test.result <- t.test (long [, rbp] - short [, rbp])
  result [rbp, 'p'] <- test.result$p.value
  }
result$p.adjusted <- p.adjust (result$p, method="bonferroni")

The Graphpad Prism analysis I have performed is:

  • Repeated measures two-way ANOVA (telling it that the values for 'long' and 'short' of a given gene are matched)

  • Sidak's multiple comparison test (but the significant comparison is also significant with Bonferroni and Holm-Sidak multiple comparison tests).

Can anyone help please?

ekad
  • 14,436
  • 26
  • 44
  • 46
  • 2
    You could do `pairwise.t.test`s, use one of the correction methods in `p.adjust` and set the `n` manually. – Roland May 16 '14 at 12:43
  • Thanks very much Roland, I got help to write the for loop with t.test mentioned above in my update. But unfortunately it still doesn't replicate the results I had from my Graphpad Prism analysis. – user3643042 May 23 '14 at 06:35
  • If your question is how to do repeated measures ANOVA and Sidak's multiple comparison test in R, you should ask just that. – Roland May 23 '14 at 06:40
  • I don't think it is because as I see it for the SidakSD or sidak commands you have to tell it where the p values are and the repeated measures anova doesn't generate these. What is missing from doing it like that is the comparisons by grouping 'long' and 'short'. The TukeyHSD would work if I could tell it which comparisons I wanted it to make. – user3643042 May 23 '14 at 06:52
  • You won't have success reproducing exact p-values using a different post-hoc test. You need to understand and accept that. For the record, I'm quite sure you can reproduce any method in Prism in R (if you *should* do that is a different question). Good luck. – Roland May 23 '14 at 07:01
  • Yeah I realise that I probably won't replicate the exact p values but that's not the problem - they're very different. That's great that you think I will be able to replicate the method in Prism but I don't know how. Thanks for the luck. – user3643042 May 23 '14 at 07:10

0 Answers0