4

How to convert python xgboost model into the pmml ?

reg = XGBRegressor(learning_rate=0.1, n_estimators=30, max_depth=4, min_child_weight=4, gamma=0.1,
                       subsample=0.9, colsample_bytree=0.8, objective='binary:logistic', reg_alpha=1,
                       scale_pos_weight=1, seed=27)
param_test = [{
        'max_depth': [i for i in range(1, 3)],
        'gamma': [i / 10.0 for i in range(0, 10)],
        'n_estimators': [i for i in range(2, 14, 2)],
}]
gsearch = GridSearchCV(reg, param_grid=param_test, scoring='neg_mean_squared_error', n_jobs=4, iid=False, cv=5)
gsearch.fit(x_train, y_train)
best_model = gsearch.best_estimator_
tkrishtop
  • 734
  • 1
  • 8
  • 18
Qiaoli Zhang
  • 41
  • 1
  • 3

1 Answers1

2

See the SkLear2PMML package: https://github.com/jpmml/sklearn2pmml

First, define a new pmml pipeline, and insert your XGBRegressor into it. Then, fit the pmml pipeline using the GridSearchCV learner. Finally, export the GridSearchCV.best_estimator_ - which shall be the optimized pmml pipeline - into PMML data format using the sklearn2pmml.sklearn2pmml function call:

pmml_pipeline = PMMLPipeline([
  ("regressor", XGBRegressor())
])
tuner = GridSearchCV(pmml_pipeline, ...)
tuner.fit(X, y)
sklearn2pmml(tuner.best_estimator_, "xgbregressor-pipeline.pmml")

Also see slide #26 of the following presentation: https://www.slideshare.net/VilluRuusmann/converting-scikitlearn-to-pmml

user1808924
  • 4,563
  • 2
  • 17
  • 20
  • i tried this way, but it's has error like :Invalid parameter gamma for estimator PMMLPipeline(steps=[('regressor', XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bytree=0.8, gamma=0.1, learning_rate=0.1, max_delta_step=0, max_depth=4, min_child_weight=4, missing=nan, n_estimators=30, n_jobs=1, nthread=None, objective='binary:logistic', random_state=0, reg_alpha=1, reg_lambda=1, scale_pos_weight=1, seed=27, silent=True, subsample=0.9))]). Check the list of available parameters with `estimator.get_params().keys()`. – Qiaoli Zhang Jan 11 '19 at 01:36
  • The error message "Invalid parameter gamma for estimator (PMML)Pipeline" means that you must prepend a step identifier to all your grid search parameter names. For example, `max_depth` should become `regressor__max_depth`, and so on. See https://github.com/scikit-learn/scikit-learn/issues/9944 – user1808924 Jan 11 '19 at 08:20
  • thank you, and i have another question, when i tried to covert to pmml, it's has a error like:RuntimeError: The JPMML-SkLearn conversion application has failed. The Java executable should have printed more information about the failure into its standard output and/or standard error streams. i want to know how to fix this problem – Qiaoli Zhang Jan 14 '19 at 01:58
  • The error is: `java.lang.UnsupportedClassVersionError: org/jpmml/sklearn/Main : Unsupported major.minor version 52.0`. It means that your `java.exe` version does not support Java (1.)8 class files. Uninstall the current `java.exe`, and install the latest one. – user1808924 Jan 14 '19 at 08:20
  • More info about this error is here:https://stackoverflow.com/questions/22489398/unsupported-major-minor-version-52-0 – user1808924 Jan 14 '19 at 08:21
  • thank you, i tried update my java version to jdk.1.8, and here is another problem like:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2: invalid continuation byte, can you teach me how to fix it. – Qiaoli Zhang Jan 14 '19 at 08:34
  • The `UnicodeDecodeError` happens because your `java.exe` is printing log messages using a charset that is not UTF-8. See here for more discussion: https://github.com/jpmml/sklearn2pmml/issues/122 The fix is to specify `java_encoding` attribute to the `sklearn2pmml` function call. Something like `sklearn2pmml(tuner.best_estimator_, "xgbregressor-pipeline.pmml", java_encoding = ...)` – user1808924 Jan 14 '19 at 10:17
  • thank you, i checked the sklearn2pmml/_init_.py, it is the new version and it is same with the style you give me in the github.com, and i also tried add java_encoding = 'UTF-8'(sklearn2pmml(tuner.best_estimator_,'xgbregressor_pipeline.pmml',java_encoding="UTF-8")) but it still has the same problem, i have no idea how to fix it. I read the data like: data = pd.read_csv('D:\\xgboost\\ivceshi.csv',encoding = 'ANSI'), i want to know could this problem cuz by the encoding = 'ANSI'? – Qiaoli Zhang Jan 15 '19 at 01:38