Numpy & Pandas: Return histogram values from pandas histogram plot?

Question

I know that I can plot histogram by pandas:

df4 = pd.DataFrame({'a': np.random.randn(1000) + 1})
df4['a'].hist()

But how can I retrieve the histogram count from such a plot?

I know I can do it by (from Histogram values of a Pandas Series)

count,division = np.histogram(df4['a'])

But get the count value after df.hist() using this feels very redundent. Is it possible to get the frequency value directly from pandas?

score 24 · Answer 1 · edited Mar 19 '19 at 13:38

24

The quick answer is:

pd.cut(df4['a'], 10).value_counts().sort_index()

From the documentation:

bins: integer, default 10
Number of histogram bins to be used

So look at pd.cut(df4['a'], 10).value_counts()

You see that the values are the same as from np.histogram

edited Mar 19 '19 at 13:38

Jealie

6,157
2
33
36

answered Jul 19 '16 at 07:05

piRSquared

285,575
57
475
624

@cqcn1991 I get that you prefer numpy. But did this answer your question? – piRSquared Jul 19 '16 at 08:45
2

Sort of. I think it would be great if we can do something like `count, division = df4['a'].hist()`. This can be more convenient and require no additional code. – ZK Zhao Jul 19 '16 at 10:08
And how does one index the resulting values? which is the domain and the range? – lesolorzanov Jun 12 '20 at 15:23

score 0 · Answer 2 · answered Oct 24 '18 at 06:54

This is another way to calculate a histogram in pandas. It is more complicated but IMO better since you avoid the weird stringed-bins that pd.cut returns that wreck any plot. You will also get style points for using .pipe():

(df['a']
 .pipe(lambda s: pd.Series(np.histogram(s, range=(0, 100), bins=20)))
 .pipe(lambda s: pd.Series(s[0], index=s[1][:-1]))
)

You can then pipe on more things at the end, like:

.pipe(lambda s: s/s.sum())

which will give you a distribution.

Ideally, there'd be a sensible density in pd.hist that could do this for you. Pandas does have a density=False keyword but it's nonsensical. I've read explanations a thousand times, like this one, but I've never understood it nor understood who would actually use it. 99.9% of the time when you see fractions on a histogram, you think "distribution", not np.sum(pdf * np.diff(bins)) which is what density=True actually calculates. Makes you want to weep.

Numpy & Pandas: Return histogram values from pandas histogram plot?

2 Answers2