Best fit to a histogramplot Iris

Question

I want to plot the best fit line to every Iris class per feature histogram plot. I have tried the solutions from these examples: 1 and 2, but dont get the result i want.

This is how the histogram looks like now, and how I want them to look, but with an best fit line per class.

Here is the code that I have used to achive this.

def load_data(path):
    data = pd.read_csv(path, sep=',')
    return data 

#the reason I have imported it like this is because I needed it on this form for something else.
tot_data = load_data(Iris.csv)
setosa = load_data(path_setosa)    
versicolor = load_data(path_versicolour,)
virginica = load_data(path_virginica)
split_data_array = [setosa,versicolor,virginica]
fig, axes = plt.subplots(nrows= 2, ncols=2, sharex='col', sharey='row')#basis for subplots
colors= ['blue', 'red', 'green', 'black'] #colors for histogram



for i, ax in enumerate(axes.flat):#loop through every feature
    for label, color in zip(range(len(iris_names)), colors): #loop through every class
        _,bins,_ = ax.hist(data[label][features[i]], label=iris_names[label], color=color, stacked=True,alpha=0.5)
        b = np.arange(50)
        
    ax.set(xlabel='Measured [cm]', ylabel='Number of samples') #sets label name
    ax.label_outer() #makes the label only be on the outer part of the plots
    ax.legend(prop={'size': 7}) #change size of legend
    ax.set_title(f'Feature {i+1}: {features[i]}') #set title for each plot
    #ax.grid('on') #grid on or off
    
#plt.savefig('histogram_rap.png',dpi=200)

plt.show()

What do you mean by "best fit line"? A vertical line for the mean value of each class? Or the median? Or something completely different? — JohanC, Apr 28 '21 at 14:26
@JohanC I would like the line in example 1 where he has plotted the histograms but in one figure and with the legends in every subplot — vegiv, Apr 28 '21 at 15:20

JohanC · Accepted Answer · 2021-04-28T16:50:02.193

With seaborn you can add a kde curve via sns.histplot(..., kde=True). Here is an example:

import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
import pandas as pd

sns.set()
iris = sns.load_dataset('iris')
# make the 'species' column categorical to fix the order
iris['species'] = pd.Categorical(iris['species'])

fig, axs = plt.subplots(2, 2, figsize=(12, 6))
for col, ax in zip(iris.columns[:4], axs.flat):
    sns.histplot(data=iris, x=col, kde=True, hue='species', common_norm=False, legend=ax==axs[0,0], ax=ax)
plt.tight_layout()
plt.show()

Some parameters of sns.histplot():

common_norm=: when True (default) scaled down each curve (or histogram) depending on the number of rows belonging to each hue value
stat=: one of “count”, “frequency”, “density”, “probability”`; determines how the y-axis gets scaled
multiple=: “layer”: default, all on the same spot;“dodge”: bars next to each other; “stack”: bars and/or curves stacked; “fill”: for each x-value the bars (and/or curves) are stacked to sum to 1`.

Just use the default, or change `legend=ax==axs[0,0]` to `legend=True`. I removed them to avoid cluttering the plot with repeated information. — JohanC, Apr 29 '21 at 10:57

Best fit to a histogramplot Iris

1 Answers1