This is another way to calculate a histogram in pandas. It is more complicated but IMO better since you avoid the weird stringed-bins that pd.cut
returns that wreck any plot. You will also get style points for using .pipe()
:
(df['a']
.pipe(lambda s: pd.Series(np.histogram(s, range=(0, 100), bins=20)))
.pipe(lambda s: pd.Series(s[0], index=s[1][:-1]))
)
You can then pipe on more things at the end, like:
.pipe(lambda s: s/s.sum())
which will give you a distribution.
Ideally, there'd be a sensible density
in pd.hist
that could do this for you. Pandas
does have a density=False
keyword but it's nonsensical. I've read explanations a thousand times, like this one, but I've never understood it nor understood who would actually use it. 99.9% of the time when you see fractions on a histogram, you think "distribution", not np.sum(pdf * np.diff(bins))
which is what density=True
actually calculates. Makes you want to weep.