1

We are given a matrix with 2 columns (samples, experiment conditions) and n rows (genes for example), and we aim to identify the genes that have significantly changed (at a specific FDR) between the two samples.

How to perform this using R?

Below is an example from fdrtool package manual that shows how to compute FDR from a vector of p-values:

library("fdrtool") 
data(pvalues)
fdr = fdrtool(pvalues, statistic="pvalue") 
fdr$qval # estimated Fdr values 
fdr$lfdr # estimated local fdr

But the problem is that we have just two vectors of observations here, not the p-values. Any ideas?

Here is a sample data that can be used: foo <- matrix(runif(1000), ncol=2)

I assume we have no replicate information, p-value, etc. But for sure the genes that have far different values between the two samples have for sure stronger evidence. Is there any way to assign FDR in this condition?

Ali
  • 9,440
  • 12
  • 62
  • 92
  • you should add some data to your question to make it reproducible. – agstudy Jun 08 '13 at 12:06
  • @agstudy You are welcome to use `foo <- matrix(runif(1000), ncol=2)`as the data – Ali Jun 08 '13 at 15:02
  • 1
    You should add that to the question. Does that adequately represent your data though? Is your data from a microarray or is it next generation data where the outcome is actually a count? The answer to that changes things a little bit. – Dason Jun 08 '13 at 15:33
  • Just because you can get a result without an error message is no guarantee that the result means what you think it should mean. The person you accepted an answer from completely distorted the meaning of his cited web-articles ... and you accepted it .. despite the fact that he gave you not code. _Caveat_ _f#$%^&-ing_ _emptor_ – IRTFM Jun 02 '16 at 04:47

1 Answers1

1

if you have one sample for each condition there is no way to have a pvalue,because this is the probability that the difference between samples drawn for one population are statistically different. But, if you have no replicates, no mean, no variance for each gene, as I understood, we can't estimate the sampling error, and therefore there is no how to differentiate the value you see from a random value, for a conventional test of small samples, as t-test. Look this, it may help:

http://en.wikipedia.org/wiki/P-value

http://www-stat.stanford.edu/~tibs/SAM/

What you can do, is a MA plot

http://en.wikipedia.org/wiki/MA_plot

and see for the distribution of your data which are the big differences, and select those. But, this is not in the statistical framework of a false discovery rate analysis, it may help as an exploratory analysis, but there is no real statistic in that. In the literature of microarray you probably will find alternatives, to make a set of assumptions and have a hypothesis test, but I don't know one to indicate, maybe the affy package have one...

enter image description here

user1265067
  • 867
  • 1
  • 10
  • 26
  • "p-values" are NOT the "probability that the difference between samples drawn for one population are statistically different". (Consider that a small p-value is generally considered to be a good result.)They are actually the probability that you would get that value (or one more extreme) if the samples were from the SAME distribution. Please review your basic statistics. Your interpretation is a common misconception, but despite being common, it is very much incorrect. – IRTFM Jun 02 '16 at 04:35