There's no predict()
function for clogitLasso()
, but I was overthinking this. You can do the matrix multiplication of the data by the coefficients yourself.
For instance:
First we'll simulate some data. 360 observations, in 180 case/control pairs. case
is coded 1/0, and set
numbers the 180 pairs. There are two covariates: e1
is noise, and x1
is associated with the outcome, case
.
library("clogitLasso")
set.seed(0)
N <- 360
mm <- data.frame(case=rep(c(1, 0), times=N/2))
mm$set <- rep(1:(N/2), each=2)
mm$e1 <- rnorm(n=N, mean=5, sd=10)
mm$x1 <- mm$case*10 + rnorm(n=N, mean=0, sd=3)
To get predictions from clogitLasso we need to normalize the covariates (mean = 0, sd = 1) ourselves, before putting the data into the model. (Otherwise clogitLasso translates the coefficients back to the "original scale", which is useless here.)
mm[, c("e1", "x1")] <- scale(mm[, c("e1", "x1")], center=TRUE, scale=TRUE)
Then build the model:
model <- clogitLasso(X=as.matrix(mm[, c("e1", "x1")]), y=as.matrix(mm$case),
strata=mm$set, standardize=FALSE)
We need to choose which value of the penalty weight we want to test the predictions for -- here we'll choose the 10th, just because.
And we multiply the original input data by the coefficients ("betas") to attempt to predict the original outcomes -- the value of case
:
handMadePredictions <- as.matrix(mm[, c("e1", "x1")]) %*% model$beta[10, ]
This is the linear predictor, which we need to transform back to the probability scale for prediction:
logistic <- function(logOdds) {
return(exp(logOdds) / (exp(logOdds) + 1))
}
handMadePredictions <- logistic(handMadePredictions)
The original data -- case
-- was a series of alternating ones and zeros. We can see that this model predicted those outcomes, from the original inputs, quite well. Either by inspecting round(handMadePredictions)
or with a confusion matrix:
table("predicted"=round(handMadePredictions), "Case/control"=mm$case)
Case/control
predicted 0 1
0 172 12
1 8 168
Note that in this toy example there are no stratum effects -- the association between x1
and case
is the same, no matter what set
the datapoints are in. In this simplified situation there is no need for conditional logistic regression, regular logistic regression will work just fine. But I haven't been able to get plausible prediction results from clogitLasso()
when there are stratum effects, which is a whole other question.