0

I am plotting the data with histogram, however I see odd color which I did not assign. Is there hindden data behind?

def plot_train_test_hist(train_df, val_df,test_df,bins=1000, colname='exp_mean'):
    ''' Check distribution of train/test scores, sanity check that its not skewed'''
    plt.hist(train_df[colname].values,bins=bins,label='train',alpha=0.7)
    plt.hist(val_df[colname].values,bins=bins,label='val',alpha=0.5)
    plt.hist(test_df[colname].values,bins=bins,label='test',alpha=0.4)
    plt.legend()
    plt.xlabel(colname,fontsize=14)
    plt.ylabel("Count",fontsize=14)
    plt.suptitle("Histogram of dataset")
    plt.title('w/o masking in Log',size= 'small')
    plt.show()

enter image description here

Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
Jin_soo
  • 65
  • 6

1 Answers1

1

You are stacking your bars on top of each other, with alpha channels. You are effectively mixing 0.7*blue + 0.5*tan + 0.4*green to make some type of moss green, 0.7*blue + 0.4*green to make some turquoise, and 0.7*blue + 0.5*tan to make light brown

If you want, you can opt to show the columns side by side instead, which is shown in this popular question explaining how to plot two histograms with one chart

Mikael Öhman
  • 2,294
  • 15
  • 21