5

One of the issues I've run into with Apache Spark, is visualizing Decision Trees.

I can produce a tree using DecisionTree.trainClassifier. and I can get some rudimentary output using :

print(model.toDebugString())

But ideally, the current output:

    If (feature 0 <= -35.0)
  If (feature 24 <= 176.0)
    Predict: 2.1
  If (feature 24 = 176.0)
    Predict: 4.2
  Else (feature 24 > 176.0)
    Predict: 6.3
Else (feature 0 > -35.0)
  If (feature 24 <= 11.0)
    Predict: 4.5
  Else (feature 24 > 11.0)
    Predict: 10.2

could be output as JSON, or something parseable, so that we could layer in a D3 Visualization library. Using the example above...

{
"node": [
    {
        "name":"node1",
        "rule":"feature 0 <= -35.0",
            "children":[
                {
                  "name":"node2",
                  "rule":"feature 24 <= 176.0",
                  "children":[
                      {
                      "name":"node4",
                      "rule":"feature 20 < 116.0",
                      "predict":  2.1
                      },
                      {
                      "name":"node5",
                      "rule":"feature 20 = 116.0",
                      "predict": 4.2
                      },
                      {
                      "name":"node5",
                      "rule":"feature 20 > 116.0",
                      "predict": 6.3
                      }
                  ]                    
                },
                {
                "name":"node3",
                "rule":"feature 0 > -35.0",
                  "children":[
                      {
                      "name":"node7",
                      "rule":"feature 3 <= 11.0",
                      "predict": 4.5
                      },
                      {
                      "name":"node8",
                      "rule":"feature 3 > 11.0",
                      "predict": 10.2
                      }
                  ]                                        
                }

            ]
    }
]

}

zero323
  • 322,348
  • 103
  • 959
  • 935
Jim Murphy
  • 333
  • 1
  • 2
  • 9
  • I am not aware of any direct method of getting decision rules out of the model but you can save it and read data files to get an easy to handle representation. You can find an example in my answer [here](http://stackoverflow.com/a/31975050/1560062). – zero323 Aug 26 '15 at 17:58
  • 2
    You can also start from model.rootNode, cast it to InternalNode and access leftChild, rightChild and so on. From there you could generate JSON similar to yours. – pzecevic Aug 29 '15 at 19:21

0 Answers0