2

I want to build models (specifically - decision trees) using spark, and later apply them using a pure python (not pyspark) application

It appears that PMML export is the intended method, but it's not yet supported for tree models, and I didn't find a PMML library for python that appears to be in active development

Ophir Yoktan
  • 8,149
  • 7
  • 58
  • 106

1 Answers1

2

No longer under development is Augustus, at https://code.google.com/p/augustus/ . More recently there has been work to support scikitlearn pmml import/export at https://github.com/alex-pirozhenko/sklearn-pmml which could be an option.

As noted by @zero323 PMML export is only available for certain models. For other models, if you are targeting a specific serving platform you can write your own custom export code or write your own parsing code (for Decission Tree's is written out in a custom parquet format which).

Holden
  • 7,392
  • 1
  • 27
  • 33
  • 1
    There is no PMML writer for DecisionTreeModel so it won't work here. – zero323 Sep 15 '15 at 06:10
  • Ah yes, thats a good point (this just answered the part of the question about where to get PMML support in Python). – Holden Sep 15 '15 at 06:12
  • 1
    Yep, since decision trees are relatively simple something like this should be sufficient: http://stackoverflow.com/a/31975050/1560062 – zero323 Sep 15 '15 at 06:18
  • Currently, alex-pirozhenko/sklearn-pmml only does the export of Python models to PMML (and not vice versa). So, it's probably not very helpful if you're still missing a PMML consumer for Python. – user1808924 Sep 15 '15 at 06:45
  • Ah good catch, so that really only leave Augustus which is not really all that active as far as development goes. – Holden Sep 15 '15 at 06:49