0

On running a glm model I encountered a strange error message which I am unable to understand. The error message was

Error in model.frame.default(formula = case ~ MENSTRUALSTATUS + PARITY + : variable lengths differ (found for 'PD')

when I run the following code:

lr.PD <- glm(case ~ MENSTRUALSTATUS + PARITY + k_BMI + PD, family = "binomial",
             data = teData.volpara)

The dataframe teData.volpara has no NAs in any of the entries. I used the following command to eliminate NAs:

teData.volpara <- teData[complete.cases(teData),]

I found a similar question here: Error in model.frame.default ...... variable lengths differ but I can't seem to find any NAs that might be causing the problem

Community
  • 1
  • 1
abra
  • 61
  • 1
  • 9
  • You need to make your question [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Use `dput` to show us some of the data which returns the same error you face. Otherwise, we can only guess what might be wrong. – LyzandeR Mar 23 '15 at 22:19
  • Are all your variables defined in the `teData.volpara` dataset, If not, the model will look in the global environment for it. An example, `newvar` is not defined in `mtcars` and is a different length to the other variables in the model: run `newvar <- 1:10` ; `glm(am ~ wt + mpg + newvar, mtcars, family="binomial")`. Can you edit your question with the output of `dput(head(teData.volpara))` and `str(teData.volpara)` thanks – user20650 Mar 24 '15 at 00:10
  • Voting to close as not reproducible ans OP is unresponsive – user20650 Mar 24 '15 at 19:48

1 Answers1

0

while this question was created along time ago it still might be relevant ! glm() has a bug (at least it's a bug in my opinion) in which SOMETIMES it will accept a formula in which the first argument is a column name (i.e. it will find the data for that column name in the dataframe and extract its data for the first argument), and at other times (perhaps with larger datasets ?) it will not - it will not find the column name in the dataframe (despite the fact that it is there) but will think that the column name is a data vector of length one (hence the error). One solution is to always put the actual data (named in the environment) as the first argument e.g. outcomeData <- c(lots of numbers); glm(outcomeData ~

Abiologist
  • 29
  • 4
  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 06 '22 at 17:03