I have a dataset of 2000 gene expression variables with 62 observations and want to obtain the p-value from regressing each of the variables on a class variable (which is either 1 meaning healthy or 2 meaning has a tumour) and want to regress each of the gene expression variables on the class variable and obtain the p-value in a matrix form- how would I do this?
Asked
Active
Viewed 114 times
0
-
you might want to look at this answer...http://stackoverflow.com/a/19743673/321622 – John Nov 12 '13 at 06:31
1 Answers
0
Your question is rather light on details, so it's difficult to be sure what you're after exactly. Can you add some example data? Here's a start that might be relevant, I've just made up some data (that might not match what you want to do):
Example data for your '2000 gene expression variables with 62 observations'
genes <- matrix(sample(2000 * 62), nrow = 62, ncol = 2000)
Example data for your 'class variable (which is either 1 meaning healthy or 2 meaning has a tumour)'
classvar <- sample(2, 62, replace = TRUE)
Here's what you'd do to get a vector of p-values for the regressions of the class variable with each of the 2000 variables in your dataset:
# from http://stackoverflow.com/a/5587781/1036500
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)
}
sapply(1:ncol(genes), function(i) lmp(lm(classvar ~ genes[,i])))
Does that help?

Ben
- 41,615
- 18
- 132
- 227