4

Seaborn distplot is now deprecated and will be removed in a future version. It is suggested to use histplot (or displot as a figure-level plot) as an alternative. But the presets differ between distplot and histplot:

from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns

x_list = [1, 2, 3, 4, 6, 7, 9, 9, 9, 10]
df = pd.DataFrame({"X": x_list, "Y": range(len(x_list))})

f, (ax_dist, ax_hist) = plt.subplots(2, sharex=True)

sns.distplot(df["X"], ax=ax_dist)
ax_dist.set_title("old distplot")
sns.histplot(data=df, x="X", ax=ax_hist)
ax_hist.set_title("new histplot")

plt.show()

enter image description here

So, how do we have to configure histplot to replicate the output of the deprecated distplot?

Mr. T
  • 11,960
  • 10
  • 32
  • 54

2 Answers2

6

Since I spent some time on this, I thought I share this so that others can easily adapt this approach:

from matplotlib import pyplot as plt
import pandas as pd
import seaborn as sns
import numpy as np

x_list = [1, 2, 3, 4, 6, 7, 9, 9, 9, 10]
df = pd.DataFrame({"X": x_list, "Y": range(len(x_list))})

f, (ax_dist, ax_hist) = plt.subplots(2, sharex=True)

sns.distplot(df["X"], ax=ax_dist)
ax_dist.set_title("old distplot")
_, FD_bins = np.histogram(x_list, bins="fd")
bin_nr = min(len(FD_bins)-1, 50)
sns.histplot(data=df, x="X", ax=ax_hist, bins=bin_nr, stat="density", alpha=0.4, kde=True, kde_kws={"cut": 3})
ax_hist.set_title("new histplot")

plt.show()

Sample output:
enter image description here

The main changes are

  • bins=bin_nr - determine the histogram bins using the Freedman Diaconis Estimator and restrict the upper limit to 50
  • stat="density" - show density instead of count in the histogram
  • alpha=0.4 - for the same transparency
  • kde=True - add a kernel density plot
  • kde_kws={"cut": 3} - extend the kernel density plot beyond the histogram limits

Regarding the bin estimation with bins="fd", I am not sure that this is indeed the method used by distplot. Comments and corrections are more than welcome.

I removed **{"linewidth": 0} because distplot has, as pointed out by @mwaskom in a comment, an edgecolor line around the histogram bars that can be set by matplotlib to the default facecolor. So, you have to sort this out according to your style preferences.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
  • 1
    You could also add `alpha=0.4` to the `histplot` – JohanC May 21 '21 at 15:00
  • Well spotted. I had the impression that the color is slightly off but didn't follow through with this thought. – Mr. T May 21 '21 at 15:28
  • 2
    "Regarding the bin estimation with bins="fd", I am not sure that this is indeed the method used by distplot. Comments and corrections are more than welcome." This is basically correct; `histplot` uses numpy's `"auto"` mode, which takes the max of the FD and Sturges estimators. The only thing that will be tricky to fully replicate is that `distplot` used `min(FD_bins, 50)` by default. So if you really want *exactly* the same behavior, you'll need to do that externally. – mwaskom May 21 '21 at 15:37
  • 2
    Oh also `linewidth=0` is wrong; `distplot` bars have visible edges, but with the matplotlib defaults the bar edgecolor is set to `"face"`. You'll see the difference if you activate one of the seaborn themes. – mwaskom May 21 '21 at 15:39
  • @mwaskom Thanks for your input, I was hoping for your comments. I assumed that `auto` was the preset for distplot because the documentation mentions something about an optimized approach. However, this was obviously wrong. Not sure though, how to implement the different edgecolor settings but I guess people can figure this out based on their specific stylesheets. – Mr. T May 21 '21 at 17:08
  • 1
    @Mr.T `distplot` uses *an* "optimized" approach (relative to a fixed number of bins), but there are lots of different reference rules that work better or worse for different sorts of data ... numpy's `"auto"` tries to balance the two of the most common ones. Actually when `distplot` was written, numpy didn't have any of the reference rules implemented, so `distplot` implements the FD rule internally. It became possible to pass `bins="auto"` once numpy added it and matplotlib hooked into numpy for computation, but the default remained to use the internal FD computation, with the upper limit. – mwaskom May 21 '21 at 18:22
0

#Use histplot() #histplot is used on univariate

import seaborn as sns
import matplotlib.pyplot as plt

fig = sns.FacetGrid(data = data, col = 'variable name', hue = 'variable name', heigth = 9, palette = 'Set1')

fig = fig.map(sns.histplot, variable name, kde = True).add_legend()
toyota Supra
  • 3,181
  • 4
  • 15
  • 19