0

I'm trying to plot a variable importance for my random forest classifier model and from the trained model I will be able to get the importance score for all my 105 feature variables for each of my 7 classes. So I want to now plot them out as simple bar graph: each bar represent a Feature variable(character variable), and the height of the bar will be its Importance score (numeric), then I can use facet to see for each Class, how the importance scores are ranked in each Class for all the features.

I want each of the Class facet plot ranks the features from high to low importance, I think this is where my code screwed up. I don't think the reorder () did not work when I facet it. Is there anyway to fix this?

Then once I can get that graph, for simplicity, I might want a second graph just to show the top 20 features in each of the facet graph.

I think the core problem is in my data, feature A might be super important for Class A but not so much for Class B, so that confuses R? A little intuitive explanation is much appreciated.

The real data (sorry for the long list wpnt fit in the limit, so I put it on google drive)

https://drive.google.com/file/d/18-1CWvALU-ILAHIFWsz8jjcvsXnXD0WQ/view?usp=sharing

The code:

ggplot(importance_tbl,aes(x=reorder(Feature,-Importance) ,y=Importance)) +
  geom_col()+
  facet_wrap(.~Class)+
  coord_flip() +
  scale_fill_grey() +
  scale_y_discrete(guide = guide_axis(n.dodge=3))+
  theme_bw()

you see the output, x axis labels is not well ordered by importance but scattered

enter image description here

ML33M
  • 341
  • 2
  • 19

0 Answers0