-1

How can I retrieve arrays containing bin ranges and counts from a sns.displot()? I found older posts relating to the previous distplot() which do not seem to be applicable anymore.

DataJanitor
  • 1,276
  • 1
  • 8
  • 19
mrLimpio
  • 45
  • 4
  • 1
    You can use `np.histogram(..., bins='auto')` to get bin ranges and corresponding counts. – JohanC Mar 13 '23 at 15:43
  • 1
    [seaborn: Can I access the results of seaborn’s statistical transformations?](https://seaborn.pydata.org/faq.html#statistical-inquiries) – Trenton McKinney Mar 13 '23 at 15:49

1 Answers1

0

For a displot with kind='kde' you will find the data under each ax .lines attribute.

For a displot with kind='hist' you will find the data under each ax .patches attribute.

import seaborn as sns
import matplotlib.pyplot as plt

penguins = sns.load_dataset("penguins")

g = sns.displot(data=penguins, x="flipper_length_mm", kind="kde", lw=8)
ax = g.axes[0, 0]  # loop if multiple axes
line = ax.lines[0]  # loop if multiple lines
x, y = line.get_data()

# Confirm by plotting over
ax.plot(x, y, "r.")

plt.show()

resulting overlay plot


You can also do it for kind=hist, but as said in the comments there is no point getting values from the plot ; if you want values, use functions that return values.

Still, below an example on how to do it, but do not rely on it and instead get values directly from numpy.histogram:

import numpy as np
import seaborn as sns

penguins = sns.load_dataset("penguins")

column = "flipper_length_mm"
x = penguins[column]

# With numpy
bin_edges = np.histogram_bin_edges(
    penguins["flipper_length_mm"],
    range=(np.nanmin(x), np.nanmax(x)),
    bins=10,
)
histogram, bin_edges = np.histogram(x, bins=bin_edges)

# With seaborn
g = sns.displot(data=penguins, x=column, kind="hist", bins=bin_edges)
ax = g.axes[0, 0]  # loop if multiple axes
rectangles = ax.patches

rectangles_x = np.array([r.get_x() for r in rectangles])
rectangles_width = np.array([r.get_width() for r in rectangles])
rectangles_height = np.array([r.get_height() for r in rectangles])

# See here, might be cases where this is not the correct formula
bin_edges_retrieved = np.array([*rectangles_x, rectangles_x[-1] + rectangles_width[-1]])

assert np.allclose(bin_edges, bin_edges_retrieved)
assert np.allclose(histogram, rectangles_height)
paime
  • 2,901
  • 1
  • 6
  • 17