2

I`m want to replace the deprecated sns.distplot with the "new" sns.histplot for the viszualization of a prediction vs measured - quality target with two overlapping histograms. "old way"

x1 = y_predict
y1 = y_test_asarray
figure(figsize=(10,10))
plt.ylabel('Probability')
plt.xlabel('Air perm.')
plt.title('Air perm. measured vs. predicted')
sns.distplot(x1 , 60,  color='red', label='pred')
sns.distplot(y1 , 60,  color='Blue', label='measured')
plt.legend()

showing this:

enter image description here

changing the code to the sns.histplot works, but the color argument isnt executed. Im not able to get the color-coding working, so both histogramms are the same color. Any recomendations to get the recoloring working?

x1 = y_predict
y1 = y_test_asarray
figure(figsize=(10,10))
plt.ylabel('Probability')
plt.xlabel('Air perm.')
plt.style.use('seaborn-whitegrid')
plt.title('Air perm. measured vs. predicted')
sns.histplot(x1 , bins=60,  color='red', kde=True, label='pred')
sns.histplot(y1 , bins=60,  color='Blue', kde=True, label='measured')
plt.legend()

enter image description here

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • See also [Emulating deprecated seaborn distplots](https://stackoverflow.com/questions/67638590/emulating-deprecated-seaborn-distplots) – JohanC Jun 19 '21 at 09:32
  • Are you using seaborn 0.11.1? Your problem doesn't seem to be reproducible. You can also create the plot in one go, setting the labels via a dictionary. In that case the bin boundaries would be shared. E.g. `sns.histplot({'pred': np.random.randn(200), 'measured': np.random.randn(500)}, bins=60, palette=['red', 'blue'], kde=True, stat='density')` – JohanC Jun 19 '21 at 09:43
  • Yes I`m using 0.11.1. While looking at the array in your code, i think i have found the error reason. My array is created from a time-series dataframe (timespan is the index), so i think therefore the array is 2-dim and causing this. array is like array([[341.5 ], [323.96], [162.97], ..., [354.83], [179.51], [236.49]]) – Florian Mühlbauer Jun 19 '21 at 12:08
  • You could use `sns.histplot(np.ravel(x1) , bins=60, color='red', ....)` to make the input 1D. The index normally will be ignored for the histogram. Note that your y-label is wrong. The y-values of the default `histplot` are the counts of each bin. With a discrete domain and individual bins, you could show a probability. With a continuous distribution, you could show a probability density. – JohanC Jun 19 '21 at 13:25

2 Answers2

1

np.ravel and the dictonary-creating works out pretty well, thanks for the held!

x1 = np.ravel(y_predict)
y1 = np.ravel(y_test_asarray)
figure(figsize=(10,10))
plt.ylabel('Probability')
plt.xlabel('Air perm.')
plt.title('Air perm. measured vs. predicted')
sns.histplot({'pred': x1, 'measured': y1}, bins=60, palette=['red', 'blue'], 
kde=True, stat='density')

enter image description here

0

@JohanC astutely diagnosed the problem with the dimensionality of your arrays, but while updating your code, why not take advantage of the features that make histplot better than distplot?

predicted = np.random.randn(400, 1)
actual = np.random.randn(400, 1)
data = {"pred": predicted.squeeze(), "measured": actual.squeeze()}
ax = sns.histplot(data, bins=60, kde=True)
ax.set(xlabel="Air perm.")
ax.figure.set_size_inches(10, 10)

enter image description here

mwaskom
  • 46,693
  • 16
  • 125
  • 127