-2

I am trying to run a binary logistic regression using For loops in R. My code for the same is as follows:

mydata5<-read.table(file.choose(),header=T,sep=",")
colnames(mydata5)
 Class <- 1:16   
Countries  <- 1:5
Months  <- 1:7
DayDiff  <- 1:28
mydata5$CT <- factor(mydata5$CT)
mydata5$CC <- factor(mydata5$CC)
mydata5$C <- factor(mydata5$C)
mydata5$DD <- factor(mydata5$DD)
mydata5$UM <- factor(mydata5$UM)
for(i in seq(along=Class))
   {
     mydata5$C=mydata5$C[i];

for(i2 in seq(along=Countries))
{
  mydata5$CC=mydata5$CC[i2];

for(i3 in seq(along=Months))
{
  mydata5$UM=mydata5$UM[i3];

for(i4 in seq(along=DayDiff))
{
  mydata5$DD=mydata5$DD[i4];

  lrfit5 <- glm(CT ~ C+CC+UM+DD, family = binomial(link = "logit"),data=mydata5)
  summary(lrfit5)
  library(lattice) 
  in_frame<-data.frame(C="mydata5$C[i]",CC="mydata5$CC[i2]",UM="mydata5$UM[i3]",DD="mydata5$DD[i4]")
  predict(lrfit5,in_frame, type="response",se.fit=FALSE)
}
}
}
}

However, I'm getting the following error: Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels

Why is the error occurring? Also,the dataset "mydata5" has around 50000 rows.Please help.

Thanks in Advance.

  • We don't have your data set so we can't run this. You don't tell us which line the error happened (guess: the `glm`). Why don't you print i,i2,i3 and i4 in the loop to find out if it happens on the first time or on a specific time? Is it getting as far as the obvious problem with the construction of `in_frame` where you've put things in quotes? Please improve this question. – Spacedman Sep 05 '14 at 07:07
  • 1
    Reading the error messages and guessing again it looks like the error you would get if you tried to fit a model with data that had categorical variables with only one category - like trying to fit an effect for "sex" with only "male" data points. But that's a guess because we don't have your datafile, or a sample, to try it on. – Spacedman Sep 05 '14 at 07:09
  • What is the intention of all these loops?!? It looks like you are horribly disfiguring your data with all these bizarre reassignments. As @Spacedman said, you should include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) otherwise we can't test possible solutions or even see where the error is really happening. I'm guessing that in the end, `mydata` has all identical covariate values after all those loops thanks to vector recycling. – MrFlick Sep 05 '14 at 07:11
  • @Spacedman: yes, the error happened in the glm line. The categorical variables have the desired number of levels. I checked that using the function lapply(mydata5[c("C", "CC", "UM", "DD")], unique). Basically, my intention behind using the 4 For loops was that i want to run various permutations and combinations from within the 4 categorical variables and determining their individual probabilities. – Srinivasan Ramanujam Sep 05 '14 at 08:15

1 Answers1

1

You have tried to do a regression with a factor having only one level. Since you haven't given us your data we can't reproduce your analysis but I can simply reproduce your error message:

> d = data.frame(x=runif(10),y=factor("M",levels=c("M","F")))
> d
            x y
1  0.07104688 M
2  0.11948466 M
3  0.20807068 M
4  0.24049508 M
5  0.44251492 M
6  0.69775646 M
7  0.44479983 M
8  0.64814971 M
9  0.75151207 M
10 0.38810621 M
> glm(x~y,data=d)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

By setting one of the factor values to "F" I don't get the error message:

> d$y[5]="F"
> glm(x~y,data=d)

Call:  glm(formula = x ~ y, data = d)

Coefficients:
(Intercept)           yF  
    0.39660      0.04591  

Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
Null Deviance:      0.5269 
Residual Deviance: 0.525    AIC: 4.91

So somewhere in your loops (which we cannot run because we don't have your data) you are doing this.

Spacedman
  • 92,590
  • 12
  • 140
  • 224