From this question pyspark-mllib-random-forest-feature-importances I see there is a method called featureImportances
that return a SparseVector.
The output is something like this:
SparseVector(2, {0: 0.6, 1:0.4})
My question is how can I associate the name of the columns with the original name of the function? Is there a way to extract the columns names from the RandomForestClassifier object?
EDIT: The model is the second stage of a pipeline. The first stage is a VectorAssembler object used to define the input columns for the model.