0

I have a dataset of 2000 gene expression variables with 62 observations and want to obtain the p-value from regressing each of the variables on a class variable (which is either 1 meaning healthy or 2 meaning has a tumour) and want to regress each of the gene expression variables on the class variable and obtain the p-value in a matrix form- how would I do this?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user2958701
  • 59
  • 1
  • 5

1 Answers1

0

Your question is rather light on details, so it's difficult to be sure what you're after exactly. Can you add some example data? Here's a start that might be relevant, I've just made up some data (that might not match what you want to do):

Example data for your '2000 gene expression variables with 62 observations'

genes <- matrix(sample(2000 * 62), nrow = 62, ncol = 2000)

Example data for your 'class variable (which is either 1 meaning healthy or 2 meaning has a tumour)'

classvar <- sample(2, 62, replace = TRUE)

Here's what you'd do to get a vector of p-values for the regressions of the class variable with each of the 2000 variables in your dataset:

  # from http://stackoverflow.com/a/5587781/1036500
  lmp <- function (modelobject) {
  if (class(modelobject) != "lm") stop("Not an object of class 'lm' ")
  f <- summary(modelobject)$fstatistic
  p <- pf(f[1],f[2],f[3],lower.tail=F)
  attributes(p) <- NULL
  return(p)
}

sapply(1:ncol(genes), function(i) lmp(lm(classvar ~ genes[,i])))

Does that help?

Ben
  • 41,615
  • 18
  • 132
  • 227