1

I am trying to produce PMML from a regression model trained in caret with method='glm'. Example model:

library('caret')

data('GermanCredit')

set.seed(123)

train_rows <- createDataPartition(GermanCredit$Class, p=0.6, list=FALSE)

train_x <- GermanCredit[train_rows, c('Age','ForeignWorker','Housing.Own',
                                      'Property.RealEstate','CreditHistory.Critical') ]
train_y <- as.integer( GermanCredit[train_rows, 'Class'] == 'Good' )

some_glm <- train( train_x, train_y, method='glm', family='binomial', 
                   trControl = trainControl(method='none') )

summary(some_glm$finalModel)

An unaccepted answer on this related question for type='rf' suggests that it is not possible to do using the matrix interface.

So I'm unable to get pmml using either the matrix or the formula syntax (which I'm pretty sure produce identical finalModels anyway):

library('pmml')

pmml(some_glm$finalModel) 
# Error in if (model$call[[1]] == "glm") { : argument is of length zero

# Same problem if I try:
some_glm2 <- train( Class ~ Age + ForeignWorker + Housing.Own + 
                      Property.RealEstate + CreditHistory.Critical, 
                    data=GermanCredit[train_rows, ], family="binomial", 
                    method='glm',
                    trControl = trainControl(method='none') )
pmml(some_glm2$finalModel)

It does work in base glm with the formula interface:

some_glm_base <- glm(Class ~ Age + ForeignWorker + Housing.Own + 
                     Property.RealEstate + CreditHistory.Critical, 
                     data=GermanCredit[train_rows, ], family="binomial")
pmml(some_glm_base) # works

For interoperablity, I would like to continue to use caret. Is there a way to convert some_glm produced in caret back to a format that pmml() will accept? Or am I forced to use the glm() construction if I want pmml functionality?

Community
  • 1
  • 1
C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • Did you read the warning message? "Warning message: In train.default(train_x, train_y, method = "glm", family = "binomial", : You are trying to do regression and your outcome only has two possible values Are you trying to do classification? If so, use a 2 level factor as your outcome column." – IRTFM Dec 29 '15 at 16:15
  • After looking at the `pmml.glm` code and the structure of the `some_glm` object, this effort appears hopeless unles syou can find a function that maps the contents of the caret-glm object to the structure of a `stats::glm` object – IRTFM Dec 29 '15 at 16:26
  • @42- I saw the warning, but I'm not doing classification, just regression on a binary outcome. (It's pretty much the same thing, but I just want a probability of the target class, not a class prediction.) – C8H10N4O2 Dec 29 '15 at 16:27
  • 1
    There is an r2pmml-package. I installed version 0.4.3 from source off GitHub and the required dependencies (but it has no help pages). I get an error with the attempt at conversion: `r2pmml(some_glm, "train-rf.pmml") Error in .jnew("org/jpmml/rexp/Main") : java.lang.UnsupportedClassVersionError: org/jpmml/rexp/Main : Unsupported major.minor version 51.0`. I'm on a Mac (with uptodate Java) so users of other OS's may want to give it a shot. – IRTFM Dec 29 '15 at 16:42
  • The problem is that `caret::train()` converts "formula interface" invocations to "matrix interface" invocations before doing any work (probably for performance reasons). – user1808924 Dec 29 '15 at 18:17
  • @42- thanks for the suggestion, I'll look into `r2pmml` – C8H10N4O2 Dec 30 '15 at 12:30

1 Answers1

1

If you set model$call[[1]], the pmml function will work correctly.

So in your case you would want to:

library('pmml')

some_glm$finalModel$call[[1]] <- "glm"
pmml(some_glm$finalModel)
  • I tried your solution but i got the following `Error in .pmmlDataDictionary(field, weights = weights, transformed = transforms) : character class is not supported for features. Supported classes: numeric, logical, factor.` – hshihab Aug 17 '17 at 14:03