1

I am trying to fit cumulative link mixed models with the ordinal package but there is something I do not understand about obtaining the prediction probabilities. I use the following example from the ordinal package:

   library(ordinal)
data(soup)
## More manageable data set:
dat <- subset(soup, as.numeric(as.character(RESP)) <=  24)
dat$RESP <- dat$RESP[drop=TRUE]
m1 <- clmm2(SURENESS ~ PROD, random = RESP, data = dat, link="logistic",  Hess = TRUE,doFit=T)
summary(m1)
str(dat)

Now I am trying to get predictions of probabilities for a new dataset

newdata1=data.frame(PROD=factor(c("Ref", "Ref")), SURENESS=factor(c("6","6")))

with

predict(m1, newdata=newdata1)

but I am getting the following error

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Why am I getting this error? Is there something in the syntax of predict.clmm2() wrong? Generally which probabilities does does predict.clmm2() output? The Pr(J<j) or Pr(J=j)? Could someone point me to information (site, books) material regarding fitting categorical (ordinal) ordinal mixed models specifically with R. From my search in the literature and net, most researchers fit these kind of models with SAS.

ECII
  • 10,297
  • 18
  • 80
  • 121
  • You *probably* need to do something like `newdata1=data.frame(PROD=factor(c("Ref","Ref") , levels = c("Ref","Somethingelse"), ... )` - the error states you can't predict something with less than 2 factor levels (which you have). – Simon O'Hanlon Jul 05 '13 at 14:46
  • (Disclaimer: I don't know anything about CLMMs) In your model formula, `SURENESS` appears to be your response variable, but you use it in your newdata instead of SOUPTYPE. Also, you leave PROD out of your original formula but include it in your newdata. Was that intentional? In any event, When I run the code, regardless whether I use SOUPTYPE or SURENESS in newdata, R tells me the other variable is missing (i.e. I'm getting a different error from you, R 2.15.0) – David Marx Jul 05 '13 at 15:04
  • Thanks. I corrected it but still spits the same error. – ECII Jul 05 '13 at 15:45
  • @DavidMarx: `predict.clmm2` requires that the response variable be in the newdata argument, as well as requiring that the factor levels match the original data. – IRTFM Jul 05 '13 at 16:21

1 Answers1

3

You did not say what you corrected, but when I use this, I get no error:

newdata1=data.frame(PROD=factor(c("Test", "Test"), levels=levels(dat$PROD)), 
                    SURENESS=factor(c("1","1")) )
predict(m1, newdata=newdata1)

The output from predict.clmm2 with a newdata argument will not make much sense unless you get all the factor levels aligned so they are in the agreement with the input data:

> newdata1=data.frame(
                PROD=factor(c("Ref", "Test"), levels=levels(dat$PROD)), 
                SURENESS=factor(c("1","1")) )
> predict(m1, newdata=newdata1)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1

Not very interesting. The prediction is for an outcome with only one level to have a probability of 1 of being in that level. (A vacuous prediction.) But recreating the structure of the original ordered outcomes is more meaningful:

> newdata1=data.frame(
             PROD=factor(c("Ref", "Test"), levels=levels(dat$PROD)), 
             SURENESS=factor(c("1","1"), levels=levels(dat$SURENESS)) , )
> predict(m1, newdata=newdata1)
[1] 0.20336975 0.03875713

You can answer the question in the comments by assembling all the predictions for various levels:

> sapply(as.character(1:6), function(x){ newdata1=data.frame(PROD=factor(c("Ref", "Test"), levels=levels(dat$PROD)), SURENESS=factor(c(x,x), levels=levels(dat$SURENESS))  );predict(m1, newdata=newdata1)})
              1          2          3          4         5         6
[1,] 0.20336975 0.24282083 0.10997039 0.07010327 0.1553313 0.2184045
[2,] 0.03875713 0.07412618 0.05232823 0.04405965 0.1518367 0.6388921
> out <- .Last.value
> rowSums(out)
[1] 1 1

The probabilities are Pr(J=j|X=x & Random=all).

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks. I guess I missed the fact that its important to spacify matcing values **and** labels for the categorical regressors. Is this specific to `predict.clmm2()` ? Do you also happen to know what kind of probabilities are in the output of `predict.clmm2()`? Are they Pr(J – ECII Jul 05 '13 at 17:11
  • 1
    Not just the regressors, but also the outcomes. – IRTFM Jul 05 '13 at 17:42
  • Thanks a lot. Just to check, the fits are of the model `log(odds)=a+bx` right? I am asking because other programs tend to fit `log(odds)=a-bx` – ECII Jul 06 '13 at 07:37
  • You should study the vignette: http://cran.r-project.org/web/packages/ordinal/vignettes/clm_intro.pdf . It looks like the clm package uses follows the convention you attribute to "other programs". I think it lets the probabilities sum to unity. – IRTFM Jul 08 '13 at 21:38
  • @42- Thanks for you instructive example. I was wondering whether it is possible to include the posterior means of the random effects into the predictions. When including the second level `RESP` into the data frame: `newdata1=data.frame( PROD=factor(c("Ref", "Test"), levels=levels(dat$PROD)), RESP=factor(c("1", "3"), levels=levels(dat$RESP)), SURENESS=factor(c("1","1"), levels=levels(dat$SURENESS)))` and feeding it into the prediction sequence for various RESP level, probabilities remain the same which implies the prediction only evaluates the fixed part? – Mark Verhagen Apr 15 '20 at 09:16
  • I wrote a long response that just barely fit in the comment space, but then realized I probably did not really understand the question. I suspect it is more appropriate for CrossValidated.com anyway. (I don't consider myself an expert in mixed models. I was not surprised that ME-predictions would only be affected by variations in FE variables, but I was not at all confident that my speculations about your options would be valid.) – IRTFM Apr 15 '20 at 15:42