Feedback: Visualization for Apache Spark Decision Trees

Question

One of the issues I've run into with Apache Spark, is visualizing Decision Trees.

I can produce a tree using DecisionTree.trainClassifier. and I can get some rudimentary output using :

print(model.toDebugString())

But ideally, the current output:

    If (feature 0 <= -35.0)
  If (feature 24 <= 176.0)
    Predict: 2.1
  If (feature 24 = 176.0)
    Predict: 4.2
  Else (feature 24 > 176.0)
    Predict: 6.3
Else (feature 0 > -35.0)
  If (feature 24 <= 11.0)
    Predict: 4.5
  Else (feature 24 > 11.0)
    Predict: 10.2

could be output as JSON, or something parseable, so that we could layer in a D3 Visualization library. Using the example above...

{
"node": [
    {
        "name":"node1",
        "rule":"feature 0 <= -35.0",
            "children":[
                {
                  "name":"node2",
                  "rule":"feature 24 <= 176.0",
                  "children":[
                      {
                      "name":"node4",
                      "rule":"feature 20 < 116.0",
                      "predict":  2.1
                      },
                      {
                      "name":"node5",
                      "rule":"feature 20 = 116.0",
                      "predict": 4.2
                      },
                      {
                      "name":"node5",
                      "rule":"feature 20 > 116.0",
                      "predict": 6.3
                      }
                  ]                    
                },
                {
                "name":"node3",
                "rule":"feature 0 > -35.0",
                  "children":[
                      {
                      "name":"node7",
                      "rule":"feature 3 <= 11.0",
                      "predict": 4.5
                      },
                      {
                      "name":"node8",
                      "rule":"feature 3 > 11.0",
                      "predict": 10.2
                      }
                  ]                                        
                }

            ]
    }
]

}

I am not aware of any direct method of getting decision rules out of the model but you can save it and read data files to get an easy to handle representation. You can find an example in my answer [here](http://stackoverflow.com/a/31975050/1560062). — zero323, Aug 26 '15 at 17:58
You can also start from model.rootNode, cast it to InternalNode and access leftChild, rightChild and so on. From there you could generate JSON similar to yours. — pzecevic, Aug 29 '15 at 19:21

Feedback: Visualization for Apache Spark Decision Trees

0 Answers0

Linked