0

I'm tring to create a Clustering Situation, with KMeans.

This is how my datasets looks like: df

With these dataset, I apply FacetGrid this way:

for c in data:
    grid= sns.FacetGrid(data, col='Clusters')
    grid.map(plt.hist,c)
    grid.set_xticklabels(rotation=90)

Output: output For all features.

This is working ok, but the FacetGrid only show Feature Value X Count for each clusters... This information is not too relevant too me, since all clusters have different 'len'.

E.g Customer Age for Cluster 1 plot is very higher than Customer Age for Cluster 0, since Cluster 1 has more elements.

What I need: I need a way to compare each column of the plot relative to its total. E.g img

I'd like to see:

Expected Result

For each cluster and each feature.

Is it possible?

Thank you.

Marcin Orlowski
  • 72,056
  • 11
  • 123
  • 141
  • instead of using `plt.hist`, write your own function: https://seaborn.pydata.org/tutorial/axis_grids.html#using-custom-functions – Paul H Nov 26 '22 at 21:04
  • 1
    You can use [`sns.distplot`](https://seaborn.pydata.org/generated/seaborn.distplot.html) or [`sns.histplot`](https://seaborn.pydata.org/generated/seaborn.histplot.html) with `stat='percent'`. This [answer](https://stackoverflow.com/a/59040003/7758804) shows how to add a horizontal line to a FacetGrid (distplot). The question is asking for too many things, and lacks focus. The question should ask one thing. – Trenton McKinney Nov 26 '22 at 21:24

0 Answers0