1

I am trying to do FDR correction for some region of interest neuroimaging data. I have run 18 linear mixed effects models overall and I have made sure that the order of the coefficients in the output would be the same in each model.

I have saved the output from each model in the following:

tidy_model1 <-tidy(model1)
tidy_model2 <-tidy(model2)
....
tidy_model18 <-tidy(model18)

I am now trying to make my life easier and create a loop which goes over a list with the names of the above model objects and creates a vector of p-values for each coefficient which I will then enter in the p.adjust function to retrieve the adjusted p-values.

so I create a list:

model_list <- list(tidy_model1,
tidy_model2,... tidy_model18)

I have tried the following loops:

for (i in 1:18) {
model_list[i] %>%
variable1_pval <- p.value[1]
}

and

for (i in 1:18) {
variable1_pval <- model_list[i]$p.value[1]
}

So the above should give me a vector of p-values for coefficient 1 of the model.

However, I get a null vector in both cases.

I know I am not providing my data but any suggestion as to why these loops might not be working are welcome!

Thank you

rawr
  • 20,481
  • 4
  • 44
  • 78
Monika Grigorova
  • 113
  • 1
  • 10
  • You’re using lists, so you need to use [[]] to access elements in the list, not []. – Limey Jun 30 '20 at 20:46
  • you would also need to create an output vector for the p-values and populate it by index after each iteration. `[sl]apply` do this for you: `sapply(model_list, function(x) x$p.value[1])` or similar – rawr Jun 30 '20 at 20:49
  • are you sure lme4 gives you p-values? It should be nlme packge you are using right – StupidWolf Jun 30 '20 at 20:55

1 Answers1

1

I made up a list of models:

library(nlme)
library(broom)

models <- lapply(1:5,function(i){
idx= sample(nrow(Orthodont),replace=TRUE)
lme(distance ~ age, random=~Sex,data = Orthodont[idx,])
})

model_list <- lapply(models,tidy,effects="fixed")

In these models, the useful coefficient is the second:

model_list[[1]]
# A tibble: 2 x 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)   15.9      1.03       15.5  7.77e-26
2 age            0.739    0.0871      8.48 9.13e-13

You can obtain the p-values in a vector like this, for your example use p.value1:

sapply(model_list,function(x)x$p.value[2])

A better way to keep track of your models, and not populate the environment with variables, is to use purrr, dplyr (see more here) :

library(purrr)
library(dplyr)

models = tibble(name=1:5,models=models) %>%
mutate(tidy_res = map(models,tidy,effects="fixed"))

models

# A tibble: 5 x 3
   name models tidy_res        
  <int> <list> <list>          
1     1 <lme>  <tibble [2 × 5]>
2     2 <lme>  <tibble [2 × 5]>
3     3 <lme>  <tibble [2 × 5]>
4     4 <lme>  <tibble [2 × 5]>
5     5 <lme>  <tibble [2 × 5]>

models %>% unnest(tidy_res) %>% filter(term=="age")
# A tibble: 5 x 7
   name models term  estimate std.error statistic  p.value
  <int> <list> <chr>    <dbl>     <dbl>     <dbl>    <dbl>
1     1 <lme>  age      0.587    0.0601      9.77 2.44e-15
2     2 <lme>  age      0.677    0.0663     10.2  3.91e-16
3     3 <lme>  age      0.588    0.0603      9.74 3.05e-15
4     4 <lme>  age      0.653    0.0529     12.3  2.74e-20
5     5 <lme>  age      0.638    0.0623     10.2  3.34e-16
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Thank you. What is the purpose of function(x)x in sapply(model_list,function(x)x$p.value[2]) ? – Monika Grigorova Jul 01 '20 at 05:23
  • when I try to run the mutate(tidy_res = map(models,tidy,effects="fixed")) or also simply tidy_res <- tidy(models), I get the error Error: C stack usage 7970256 is too close to the limit – Monika Grigorova Jul 01 '20 at 07:20
  • 1
    If you want the p.value for the 1st element of the list, you do list[[1]]$p.value[2]; when you do ```sapply``` on a list, it iterates through the element, so ```list[[1]]```, ```list[[2]]``` becomes x – StupidWolf Jul 01 '20 at 07:43
  • and for the tidy part, your model or data.frames are huge? – StupidWolf Jul 01 '20 at 07:45
  • I have 18 models with 8 variables each and the original data frame on which I ran the models has 162 variables. Do you think the reason might be the size of my dataset and it would be worth trimming it down to the more relevant variables? – Monika Grigorova Jul 01 '20 at 07:49
  • Also, I managed to make the sapply(model_list,function(x)x$p.value[2]) method work. Thank you! – Monika Grigorova Jul 01 '20 at 07:52
  • the number of variables are ok.. do they have a lot of observations? if so, then maybe it's not worth putting all of them under a tibble.. my point is more like, it's better to have a table to organize the ```tidy``` results, and ```p.values``` – StupidWolf Jul 01 '20 at 07:55
  • What if we did a backwards stepwise variable selection? How do we apply the FDR correction then? – skan Jul 09 '23 at 16:12