1

I have generated a decision tree model of iris dataset using bigml.com . I have downloaded this decision tree model as PMML and wants to use it for prediction in my local computer.

PMML model from bigml

<?xml version="1.0" encoding="utf-8"?>
<PMML version="4.2" xmlns="http://www.dmg.org/PMML-4_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Header description="Generated by BigML"/>
    <DataDictionary>
        <DataField dataType="double" displayName="Sepal length" name="000001" optype="continuous"/>
        <DataField dataType="double" displayName="Sepal width" name="000002" optype="continuous"/>
        <DataField dataType="double" displayName="Petal length" name="000003" optype="continuous"/>
        <DataField dataType="double" displayName="Petal width" name="000004" optype="continuous"/>
        <DataField dataType="string" displayName="Species" name="000005" optype="categorical">
            <Value value="Iris-setosa"/>
            <Value value="Iris-versicolor"/>
            <Value value="Iris-virginica"/>
        </DataField>
    </DataDictionary>
    <TreeModel algorithmName="mtree" functionName="classification" modelName="">
        <MiningSchema>
            <MiningField name="000001"/>
            <MiningField name="000002"/>
            <MiningField name="000003"/>
            <MiningField name="000004"/>
            <MiningField name="000005" usageType="target"/>
        </MiningSchema>
        <Node recordCount="150" score="Iris-setosa">
            <True/>
            <ScoreDistribution recordCount="50" value="Iris-setosa"/>
            <ScoreDistribution recordCount="50" value="Iris-versicolor"/>
            <ScoreDistribution recordCount="50" value="Iris-virginica"/>
            <Node recordCount="100" score="Iris-versicolor">
                <SimplePredicate field="000003" operator="greaterThan" value="2.45"/>
                <ScoreDistribution recordCount="50" value="Iris-versicolor"/>
                <ScoreDistribution recordCount="50" value="Iris-virginica"/>
                <Node recordCount="46" score="Iris-virginica">
                    <SimplePredicate field="000004" operator="greaterThan" value="1.75"/>
                    <ScoreDistribution recordCount="45" value="Iris-virginica"/>
                    <ScoreDistribution recordCount="1" value="Iris-versicolor"/>
                    <Node recordCount="43" score="Iris-virginica">
                        <SimplePredicate field="000003" operator="greaterThan" value="4.85"/>
                        <ScoreDistribution recordCount="43" value="Iris-virginica"/>
                    </Node>
                    <Node recordCount="3" score="Iris-virginica">
                        <SimplePredicate field="000003" operator="lessOrEqual" value="4.85"/>
                        <ScoreDistribution recordCount="2" value="Iris-virginica"/>
                        <ScoreDistribution recordCount="1" value="Iris-versicolor"/>
                        <Node recordCount="1" score="Iris-versicolor">
                            <SimplePredicate field="000002" operator="greaterThan" value="3.1"/>
                            <ScoreDistribution recordCount="1" value="Iris-versicolor"/>
                        </Node>
                        <Node recordCount="2" score="Iris-virginica">
                            <SimplePredicate field="000002" operator="lessOrEqual" value="3.1"/>
                            <ScoreDistribution recordCount="2" value="Iris-virginica"/>
                        </Node>
                    </Node>
                </Node>
                <Node recordCount="54" score="Iris-versicolor">
                    <SimplePredicate field="000004" operator="lessOrEqual" value="1.75"/>
                    <ScoreDistribution recordCount="49" value="Iris-versicolor"/>
                    <ScoreDistribution recordCount="5" value="Iris-virginica"/>
                    <Node recordCount="6" score="Iris-virginica">
                        <SimplePredicate field="000003" operator="greaterThan" value="4.95"/>
                        <ScoreDistribution recordCount="4" value="Iris-virginica"/>
                        <ScoreDistribution recordCount="2" value="Iris-versicolor"/>
                        <Node recordCount="3" score="Iris-versicolor">
                            <SimplePredicate field="000004" operator="greaterThan" value="1.55"/>
                            <ScoreDistribution recordCount="2" value="Iris-versicolor"/>
                            <ScoreDistribution recordCount="1" value="Iris-virginica"/>
                            <Node recordCount="1" score="Iris-virginica">
                                <SimplePredicate field="000003" operator="greaterThan" value="5.45"/>
                                <ScoreDistribution recordCount="1" value="Iris-virginica"/>
                            </Node>
                            <Node recordCount="2" score="Iris-versicolor">
                                <SimplePredicate field="000003" operator="lessOrEqual" value="5.45"/>
                                <ScoreDistribution recordCount="2" value="Iris-versicolor"/>
                            </Node>
                        </Node>
                        <Node recordCount="3" score="Iris-virginica">
                            <SimplePredicate field="000004" operator="lessOrEqual" value="1.55"/>
                            <ScoreDistribution recordCount="3" value="Iris-virginica"/>
                        </Node>
                    </Node>
                    <Node recordCount="48" score="Iris-versicolor">
                        <SimplePredicate field="000003" operator="lessOrEqual" value="4.95"/>
                        <ScoreDistribution recordCount="47" value="Iris-versicolor"/>
                        <ScoreDistribution recordCount="1" value="Iris-virginica"/>
                        <Node recordCount="1" score="Iris-virginica">
                            <SimplePredicate field="000004" operator="greaterThan" value="1.65"/>
                            <ScoreDistribution recordCount="1" value="Iris-virginica"/>
                        </Node>
                        <Node recordCount="47" score="Iris-versicolor">
                            <SimplePredicate field="000004" operator="lessOrEqual" value="1.65"/>
                            <ScoreDistribution recordCount="47" value="Iris-versicolor"/>
                        </Node>
                    </Node>
                </Node>
            </Node>
            <Node recordCount="50" score="Iris-setosa">
                <SimplePredicate field="000003" operator="lessOrEqual" value="2.45"/>
                <ScoreDistribution recordCount="50" value="Iris-setosa"/>
            </Node>
        </Node>
    </TreeModel>
</PMML>

I generally use R for machine learning and wants to load and use this model for prediction in my system. R itself has a pmml package but it seems that it is not possible to use it for prediction. Is there any other way I can use this PMML model for prediction in R. If its not possible can this PMML model can be used with other languages such as python or weka? if yes how can i do it (code required).

python model from bigml

def predict_species(sepal_width=None,
                    petal_length=None,
                    petal_width=None):
    """ Predictor for Species from 

        This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic 
        in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes 
        of 50 instances each, where each class refers to a type of iris plant.
        Source
        Iris Data Set[*]
        Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository[*]. Irvine, CA: University of California, School of Information and Computer Science.

        [*]Iris Data Set: http://archive.ics.uci.edu/ml/datasets/Iris
        [*]UCI Machine Learning Repository: http://archive.ics.uci.edu/ml
    """
    if (petal_length is None):
        return u'Iris-setosa'
    if (petal_length > 2.45):
        if (petal_width is None):
            return u'Iris-versicolor'
        if (petal_width > 1.75):
            if (petal_length > 4.85):
                return u'Iris-virginica'
            if (petal_length <= 4.85):
                if (sepal_width is None):
                    return u'Iris-virginica'
                if (sepal_width > 3.1):
                    return u'Iris-versicolor'
                if (sepal_width <= 3.1):
                    return u'Iris-virginica'
        if (petal_width <= 1.75):
            if (petal_length > 4.95):
                if (petal_width > 1.55):
                    if (petal_length > 5.45):
                        return u'Iris-virginica'
                    if (petal_length <= 5.45):
                        return u'Iris-versicolor'
                if (petal_width <= 1.55):
                    return u'Iris-virginica'
            if (petal_length <= 4.95):
                if (petal_width > 1.65):
                    return u'Iris-virginica'
                if (petal_width <= 1.65):
                    return u'Iris-versicolor'
    if (petal_length <= 2.45):
        return u'Iris-setosa'
Community
  • 1
  • 1
Eka
  • 14,170
  • 38
  • 128
  • 212

1 Answers1

2

The easiest way to perform local predictions with BigML is just to download the model (ensemble, cluster, anomaly detector, etc) directly via an API call.

For example, using BigML's Python Bindings for a classification or regression model, you'll do something like:

from bigml.model import Model
model = Model('model/570f4b6e84622c5ed10095a9')
model.predict({'feature_1': 1, 'feature_2': 2})

To use a local cluster to find the closest centroid:

from bigml.cluster import Cluster
cluster = Cluster('cluster/572500b849c4a15c9d00451f')
cluster.centroid({'feature_1': 1, 'feature_2': 2})

To use a local anomaly detector to score a new data point:

from bigml.anomaly import Anomaly
anomaly_detector = Anomaly('anomaly/570f4c333bbd21090101e79f')
anomaly_detector.anomaly_score({'feature_1': 1, 'feature_2': 2})

The classes above (Model, Cluster, and Anomaly) will download the JSON PML code that defines each model and reify it into a local function (in this case into python). As probably you don't want to use R for implementing a real-world application, it's better to perform the predictions in the language that you'll use for your application: python, node.js, java, etc. BigML offers open-source bindings for all of them.