I have the following Pandas
DataFrame (abbreviated here):
df = pd.DataFrame([
("Distal Lung AT2", 0.4269588779192778, 20),
("Lung Ciliated epithelial cells", 0.28642167657082035, 20),
("Distal Lung AT2",0.4488207834077291,15),
("Lung Ciliated epithelial cells", 0.27546336897259094, 15),
("Distal Lung AT2", 0.45502553604960105, 10),
("Lung Ciliated epithelial cells", 0.29080413886147555, 10),
("Distal Lung AT2", 0.48481604554028446, 5),
("Lung Ciliated epithelial cells", 0.3178232409599174, 5)],
columns = ["features", "importance", "num_features"])
I'd like to create a stacked bar plot where the x-axis represents the num_features
(so rows with the same num_features
should be grouped together), the y axis represents importance
, and each bar in the bar plot has blocks colored by features
I tried using plotnine
for this, as follows:
plot = (
ggplot(df, aes(x="num_features", y="importance", fill="features"))
+ geom_bar(stat="identity")
+ xlab("Number of Features")
+ ylab("")
)
However, when I try to save the plot so I can view it ggsave(plot, os.path.join(figure_path, "stacked_feature_importances.png"))
, I get:
Traceback (most recent call last):
File "/home/mdanb/plot_top_features_iteratively.py", line 94, in <module>
plot_stacked_bar_plots(backwards_elim_dirs)
File "/home/mdanb/plot_top_features_iteratively.py", line 87, in plot_stacked_bar_plots
ggsave(plot, os.path.join(figure_path, "stacked_feature_importances.png"))
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 736, in ggsave
return plot.save(*arg, **kwargs)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 724, in save
fig, p = self.draw(return_ggplot=True)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 203, in draw
self._build()
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 311, in _build
layers.compute_position(layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/layer.py", line 79, in compute_position
l.compute_position(layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/layer.py", line 393, in compute_position
data = self.position.compute_layer(data, params, layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position.py", line 56, in compute_layer
return groupby_apply(data, 'PANEL', fn)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/utils.py", line 638, in groupby_apply
lst.append(func(d, *args, **kwargs))
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position.py", line 54, in fn
return cls.compute_panel(pdata, scales, params)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position_stack.py", line 85, in compute_panel
trans = scales.y.trans
AttributeError: 'scale_y_discrete' object has no attribute 'trans'
I also looked into trying directly to use Pandas
without plotnine
, based on this post. However, it doesn't quite address my issue because the bar plot is stacked based on counts, whereas I specifically want to stack it based on values of a column (importance
)