0

From this question pyspark-mllib-random-forest-feature-importances I see there is a method called featureImportances that return a SparseVector.

The output is something like this:

SparseVector(2, {0: 0.6, 1:0.4}) 

My question is how can I associate the name of the columns with the original name of the function? Is there a way to extract the columns names from the RandomForestClassifier object?

EDIT: The model is the second stage of a pipeline. The first stage is a VectorAssembler object used to define the input columns for the model.

paolof89
  • 1,319
  • 5
  • 17
  • 31
  • random forest takes on input two columns, label and features column of type Vector, so what you mean about "column names" ? – chlebek Oct 18 '19 at 21:09
  • That's exactly the problem, I lose the feature names in the previous step of the pipeline when I use the VectorAssembler object. I'll edit the question – paolof89 Oct 21 '19 at 07:32

0 Answers0