R Error in model.frame.default - Variable lengths differ

Question

Very first post here, also, this is only my second time using R, so please be gentle.

I'm working on a project where I in R am trying to make multiple analyses using the lme4 and lmeTest packages via their lmer function. To do so I have list of the names of the variables i want to analyse that I itterate throug using a for loop. Thus it would look something like this :

list <- MyList of IDs
raw <- My Data File

for (i in list) {
  model <- lmer (`i` ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), data = raw)
  .
  .
  .
  Do something..
  .
  .
}

This however generates this error:

Error in model.frame.default(data = raw, drop.unused.levels = TRUE, formula = paste(i) ~  : 
  variable lengths differ (found for 'Time')

I'm quite sure that the issue is related to the lmer statment, maybe to the i as well as everything works perfectly whenever I manually copy past a value into the place of i. However, as I have a lot of values in "list" I need some kind of loop.

I have googled this quite a lot, and found several answers in here attempting to solve the same/similar issues, but for me they don't work. Below is a list of links to some of the better solutions out there, however they didn't solve my problem.

Can anyone offer some insights into this phenomenon?

Edit 1 - @r2evans asked for a reproducible example. This is provided below.

#Packages used
library(Matrix) 
library(lme4)
library(lmerTest)
library(stringr)
library(readr)
#Starting to read data
rootDir <- getwd()
raw <- read.csv(str_c(rootDir, "/P035aForR.csv"), na = c("", "NA", "0"))
list <- read.csv(str_c(rootDir, "/P035aHeaddersForR2.csv"), na = c("", "NA", "0"), header = FALSE)
#Generate dir for output and the dataframe to store the main results
dir.create("Results", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Data", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/QQPlot", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/RawResiduals", showWarnings = TRUE, recursive = FALSE, mode = "0777")
dir.create("Results/Pictures/PersonResiduals", showWarnings = TRUE, recursive = FALSE, mode = "0777")
Result <- data.frame("","","","","","")
names(Result)<-c("ID","SecretorStatus","Time","BioRep","TechRep","Shapiro") ### <-- Update the headders as needed
Result <- Result[-c(1),]
#Load variables from raw as factors for the analysis
raw$SecretorStatus <- as.factor(raw$SecretorStatus)
raw$Time <- as.factor(raw$Time)
raw$TechRep <- as.factor(raw$TechRep)
raw$BioRep <- as.factor(raw$BioRep)
raw$Random <- as.factor(raw$Random)
for (i in list) {
  model <- lmer (`i` ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), data = raw) 
  anova <- as.data.frame(anova(model, type = 2),) 
  residuals <- residuals(model, "response")
  pvalue <- round(shapiro.test(residuals)$p.value, digits=4 ) 
  out <- as.data.frame(anova(model, type = 2))
  write.table(out, "Results/Data/i.txt", col.names=T, row.names=T, quote=F, sep=",")
  png(Results/Pictures/QQPlot/i.png)
  qqnorm(residuals, sub=paste("p-value of a Shapiro-Wilks test =", pvalue )); qqline(residuals) 
  dev.off()
  Pearson.Residuals <- residuals(model, "pearson")
  Raw.Residuals <- residuals(model, "response")
  Fitted <- fitted(model)
  png(Results/Pictures/RAWResiduals/i.png)
  plot(Fitted, Raw.Residuals, main=i)
  dev.off()
  png(Results/Pictures/PersonResiduals/i.png)
  plot(Fitted,Pearson.Residuals, main=i)
  dev.off()
  tmp <- data.frame(i, pvalue, t(anova[,"Pr(>F)"]))
  names(tmp) <- c("ID", "Shapiro", row.names(anova))
  Result <- rbind(Result, tmp)
}

At this point the error is generated where it says:

Error in model.frame.default(data = raw, drop.unused.levels = TRUE, formula = i ~  : 
  variable lengths differ (found for 'Time')

Regarding the setup of the data, I cannot provide the full dataset (obviously), however a sample that demonstrates the structure is provided below. This is the data for the raw variable.

SecretorStatus  Time    TechRep BioRep  Random  ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10    ID11    ID12    ID13    ID14    ID15    ID16
1   1   1   1   1   23342.99    23342.99    0   0   0   0   0   0   0   0   0   0   102829.8    492252.5    0   924436.3
2   1   2   5   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   529782
2   1   1   6   3   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   506987.7
2   1   2   6   4   0   0   0   0   0   0   0   0   0   0   0   0   0   48786.41    0   618768.5
1   1   2   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0   414852.1    354153.5    850788.9
1   1   1   2   2   0   0   0   0   0   0   0   0   0   99551.51    0   0   322185.6    0   361100.2    819073.6
1   1   2   2   3   0   0   0   0   0   90194.2 0   0   0   73646.15    0   0   0   398369.2    277569.9    613257.3
1   1   1   3   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
1   1   2   3   1   0   0   0   0   0   0   0   0   0   0   0   0   0   265760.8    0   0
2   1   1   4   2   0   0   0   0   0   0   0   0   0   0   0   0   61351.9 554385.9    0   656984.3
2   1   2   4   3   0   0   0   0   0   0   0   0   0   0   0   0   0   622428.4    0   769227.8
2   1   1   5   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   388584.9
1   2   1   1   1   31454.26    31454.26    0   0   0   0   0   0   0   0   0   0   0   0   0   729234.2
1   2   2   1   2   0   0   0   0   0   0   0   0   0   0   0   0   0   333620.4    0   933046.3
1   2   1   2   3   0   0   0   0   0   0   0   0   0   0   0   0   0   834145.3    0   0
1   2   2   2   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   157152.7
1   2   1   3   1   0   0   0   0   0   0   0   0   0   0   0   0   0   178179.3    0   812282.9
1   2   2   3   2   0   0   0   0   0   86782.91    0   0   0   0   0   0   0   191167  0   663968.9
2   2   1   4   3   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   610315.3
2   2   2   4   4   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   339407.1
2   2   1   5   1   0   0   0   0   0   0   0   0   0   213881.1    0   0   0   0   0   298894.5
2   2   2   5   2   0   0   0   0   0   0   0   0   0   81122.63    0   0   0   0   0   170576.6
2   2   1   6   3   0   0   0   0   0   0   0   0   0   53790.86    0   0   0   0   0   205826
2   2   2   6   4   0   0   0   0   0   37900.34    0   0   0   0   0   0   0   0   315754  232529.7

For the data in list, the data structure for the example data above would thus be.

ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8 ID9 ID10    ID11    ID12    ID13    ID14    ID15    ID16

The two files with data for raw and list are both csv files. Throughout this entire thing only one error is produced (at least in my end). However, if you execute the code twice, it will complain that the folders already exists. However, apart from that it only generates one error. Another detail that I have noted is that the variable i is stuck at ID1 when this error happens, thus it doesn’t browse through the entire list, it fails at the first object. I hope this helps for the clarification. Please let me know if there are any more details that you want/need to reproduce the error.

Welcome to SO, MNTsnowman! Please make this question *reproducible*. This includes sample code you've attempted (including listing non-base R packages, and any errors/warnings received), sample *unambiguous* data (e.g., `data.frame(x=...,y=...)` or the output from `dput(head(x))`), and intended output given that input. Refs: https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. (FYI, it's generally bad practice to name objects after common base functions such as `list` and `raw`. While R often does the right thing, it can make reading others' code harder.) — r2evans, Jan 07 '22 at 15:25
Please provide enough code so others can better understand or reproduce the problem. — Community, Jan 14 '22 at 16:41
Hi @r2evans, a reproducible example has been added as an edit to the orriginal post. I hope this helps. plase let me know if you need anything else. — MNTsnowman, Jan 15 '22 at 22:32
This is **not** minimal (none of the plotting, residuals, directory creation, etc., are relevant to the problem AFAICS) ... — Ben Bolker, Jan 15 '22 at 22:51

Ben Bolker · Answer 1 · 2022-01-15T22:54:05.913

1

It's a little hard to tell from what you've given us, but I think this should do it:

pred_vars <- c("Time", "SecretorStatus", "BioRep", "TechRep", "(1|Random)")
list_of_IDs <- names(raw)[startsWith(names(raw), "ID")]
for (i in list_of_IDs) {
  f <- reformulate(pred_vars, response = i)
  model <- lmer (f, data = raw)
  ## ...
}

you don't really need reformulate, you can also use paste or sprintf or any other string-manipulation machinery to put your formula together as a string and then apply as.formula(), but reformulate is a little bit nicer.

A slightly more efficient solution would use refit():

for (i in list_of_IDs) {
   if (i == list_of_IDs[1]) {
      model <- lmer (ID1 ~ Time + SecretorStatus + BioRep + TechRep + (1|Random), 
                     data = raw)
   } else {
      model <- refit(model, newresp = raw[[i]])
   }
   ## ...

}

As stated in ?refit, "the ‘refit()’ approach should be faster because it bypasses the creation of the model representation and goes directly to the optimization step."

edited Jan 15 '22 at 22:54

answered Jan 09 '22 at 22:16

Ben Bolker

211,554
25
370
453

Hi @Ben Bolker, thanks a lot for the reply. An edit has been added to the orriginal post with some more information, i hope this clarifies what i'm trying to do. Regarding your two suggestions, the first one provides this error "Error: grouping factors must have > 1 sampled level", could this have something to do with the data structure? (Answer 1 of 2) – MNTsnowman Jan 15 '22 at 22:46
The second suggestion provides this error "Error in eval(predvars, data, env) : object 'first_variable' not found In addition: Warning message: In if (i == list) { : the condition has length > 1 and only the first element will be used" when i then insert the name of the first variable in the place of first_variable, the error produced is this "Error: grouping factors must have > 1 sampled level In addition: Warning message: In if (i == list2) { : the condition has length > 1 and only the first element will be used" (Answer 2 of 2) – MNTsnowman Jan 15 '22 at 22:46
Both of my examples above run with your example data above (I do get warnings but they're presumably due to the small subset of the data given). If you want more help you're going to need to narrow down the source of the errors (possibly ask a new question); I don't really want to dig through all of your code myself ... – Ben Bolker Jan 15 '22 at 22:56
PS the `first_variable` error is because you're supposed to fill that in with whatever the name of your first ID variable is. I've done that now. Can't help you with the first error (note that it's different from the error you originally asked for help with ...) – Ben Bolker Jan 15 '22 at 22:57

R Error in model.frame.default - Variable lengths differ

1 Answers1