0

I have a dataframe with 8,000 genes expression values in rows and 40 columns. The 40 columns represent observations from 4 different time points. The first ten columns are from no treatment, the second ten are after one week of treatment, the third column is after 2 weeks of treatment, and the fourth column is after 7 weeks of treatment.

This is the idea (smaller ex edit):

m1 <- matrix(data=1:32, nrow = 8, ncol = 4)
time <- c(1:4)

summaries <- data.frame(nrow=8, ncol=4)
pvalues <- function(x) {
for (i in (1:8)) {
raw <- unlist(as.vector(x[i,]))
lm <- lm(raw ~ time)
summaries[i, ] <- summary(lm)$coefficients[,4]
return(summaries)
}
}
pvalues(m1)

I know it is still overwriting but I don't know how to fit ix.

  • 1
    Welcome to SO! It's hard to help, because your example is not [minimal and reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). However you can try to adapt this to get the p-values `mod <-lm(dist ~ speed, data=cars); summary(mod)$coefficients[,4]` . – s__ Nov 09 '20 at 18:46
  • Thanks for your help, I tried to make it more generic now. – Julia Bruner Nov 09 '20 at 19:17
  • Your loop overwrites the entire `summaries` object each iteration. Initialize an approporiately-sized data structure and fill it in. Something like this, before the loop: `summaries = matrix(nrow = 8000, ncol = 2)` (2 assuming 1 intercept, 1 predictor) and then in the loop `summaries[i, ] <- summary(lm1)$coefficients`. – Gregor Thomas Nov 09 '20 at 19:24
  • The example is improved, but it would be great if it would run in a fresh R session - as written, there's no response variable to test with... (A small test with, say, 8 instead of 8000 would be nice to illustrate the problem if you need more help) – Gregor Thomas Nov 09 '20 at 19:26
  • I still don't understand why its overwriting, hopefully this makes my question more clear – Julia Bruner Nov 09 '20 at 19:39
  • Much better! Posting an answer now... – Gregor Thomas Nov 09 '20 at 19:41

1 Answers1

1

Okay I ran your code and got

Warning message:

In summary.lm(lm) : essentially perfect fit: summary may be unreliable

This is because the sample data was sequential. I changed m1 to use rnorm(32) instead of 1:32, and that problem is solved. (Though this problem is specific to the example, not the real case, I would assume.)

Next problem: the size of the summaries object. You used data.frame which takes column names as arguments. We want to use matrix which lets you set the rows and columns:

## Bad - 1 row and 2 columns
summaries <- data.frame(nrow=8, ncol=4)
summaries
#   nrow ncol
# 1    8    4

## Good - 8 rows, 2 columns
summaries <- matrix(nrow = 8, ncol = 2)
# summaries
#      [,1] [,2]
# [1,]   NA   NA
# [2,]   NA   NA
# [3,]   NA   NA
# [4,]   NA   NA
# [5,]   NA   NA
# [6,]   NA   NA
# [7,]   NA   NA
# [8,]   NA   NA

But then, running your code still only the first row is filled! Why? A function stops as soon as it hits a return(), thinking it is done. We need to move return(summaries) to after the for loop, not inside the for loop. Putting it all together:

m1 <- matrix(data = rnorm(32),
             nrow = 8,
             ncol = 4)
time <- c(1:4)

summaries <- matrix(nrow = 8, ncol = 2)
pvalues <- function(x) {
  for (i in (1:8)) {
    raw <- c(x[i, ])
    lm <- lm(raw ~ time)
    summaries[i,] <- summary(lm)$coefficients[, 4]
  } ## End the for loop before the `return()`
  return(summaries)
}
pvalues(m1)
#           [,1]      [,2]
# [1,] 0.6235167 0.5461115
# [2,] 0.4256698 0.3992509
# [3,] 0.3041439 0.2751724
# [4,] 0.8087557 0.8252432
# [5,] 0.8820501 0.1812292
# [6,] 0.4997327 0.5582880
# [7,] 0.5589398 0.8150613
# [8,] 0.6283059 0.8994896

We can gussy it up a little bit by assigning column names based on the last iteration summary:

pvalues <- function(x) {
  for (i in (1:8)) {
    raw <- c(x[i, ])
    lm <- lm(raw ~ time)
    summaries[i,] <- summary(lm)$coefficients[, 4]
  }
  colnames(summaries) = names(summary(lm)$coefficients[, 4])
  return(summaries)
}
pvalues(m1)
#      (Intercept)      time
# [1,]   0.6235167 0.5461115
# [2,]   0.4256698 0.3992509
# [3,]   0.3041439 0.2751724
# [4,]   0.8087557 0.8252432
# [5,]   0.8820501 0.1812292
# [6,]   0.4997327 0.5582880
# [7,]   0.5589398 0.8150613
# [8,]   0.6283059 0.8994896
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294