How can I retrieve arrays containing bin ranges and counts from a sns.displot()
? I found older posts relating to the previous distplot()
which do not seem to be applicable anymore.
Asked
Active
Viewed 324 times
-1

DataJanitor
- 1,276
- 1
- 8
- 19

mrLimpio
- 45
- 4
-
1You can use `np.histogram(..., bins='auto')` to get bin ranges and corresponding counts. – JohanC Mar 13 '23 at 15:43
-
1[seaborn: Can I access the results of seaborn’s statistical transformations?](https://seaborn.pydata.org/faq.html#statistical-inquiries) – Trenton McKinney Mar 13 '23 at 15:49
1 Answers
0
For a displot with kind='kde'
you will find the data under each ax .lines
attribute.
For a displot with kind='hist'
you will find the data under each ax .patches
attribute.
import seaborn as sns
import matplotlib.pyplot as plt
penguins = sns.load_dataset("penguins")
g = sns.displot(data=penguins, x="flipper_length_mm", kind="kde", lw=8)
ax = g.axes[0, 0] # loop if multiple axes
line = ax.lines[0] # loop if multiple lines
x, y = line.get_data()
# Confirm by plotting over
ax.plot(x, y, "r.")
plt.show()
You can also do it for kind=hist
, but as said in the comments there is no point getting values from the plot ; if you want values, use functions that return values.
Still, below an example on how to do it, but do not rely on it and instead get values directly from numpy.histogram
:
import numpy as np
import seaborn as sns
penguins = sns.load_dataset("penguins")
column = "flipper_length_mm"
x = penguins[column]
# With numpy
bin_edges = np.histogram_bin_edges(
penguins["flipper_length_mm"],
range=(np.nanmin(x), np.nanmax(x)),
bins=10,
)
histogram, bin_edges = np.histogram(x, bins=bin_edges)
# With seaborn
g = sns.displot(data=penguins, x=column, kind="hist", bins=bin_edges)
ax = g.axes[0, 0] # loop if multiple axes
rectangles = ax.patches
rectangles_x = np.array([r.get_x() for r in rectangles])
rectangles_width = np.array([r.get_width() for r in rectangles])
rectangles_height = np.array([r.get_height() for r in rectangles])
# See here, might be cases where this is not the correct formula
bin_edges_retrieved = np.array([*rectangles_x, rectangles_x[-1] + rectangles_width[-1]])
assert np.allclose(bin_edges, bin_edges_retrieved)
assert np.allclose(histogram, rectangles_height)

paime
- 2,901
- 1
- 6
- 17
-
I am using your code with kind = "hist" and .patches instead of .lines. But I am getting "AttributeError: 'Rectangle' object has no attribute 'get_data" – mrLimpio Mar 13 '23 at 15:08
-
-
This is also why we don’t answer questions without a complete [mre]. – Trenton McKinney Mar 13 '23 at 15:31
-
I am not looking for individual data points as in the proposed solution above. I am looking for the bin ranges and counts in a histogram using displot(). That should be possible? – mrLimpio Mar 13 '23 at 15:50
-
@mrClean it is not, as per the link in the comment to your question. – Trenton McKinney Mar 13 '23 at 16:14
-
Actually it is still possible, see edit, but not reliable, and as @TrentonMcKinney said, you should use other functions to get values. – paime Mar 13 '23 at 18:23