Superimposition of histogram and density in Pandas/Matplotlib in Python

Question

I've got a Pandas dataframe named clean which contains a column v for which I would like to draw a histogram and superimpose a density plot. I know I can plot one under the other this way:

import pandas as pd
import matplotlib.pyplot as plt

Maxv=200

plt.subplot(211)
plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")

plt.subplot(212)
ax=clean['v'].plot(kind='density')
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)

But when I try to superimpose, y scales doesn't match (and I loose y axis ticks and labels):

yhist, xhist, _hist = plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")

ax=clean['v'].plot(kind='density') #I would like to insert here a normalization to max(yhist)/max(ax)
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)

Some hint? (Additional question: how can I change the width of density smoothing?)

[this answer should help](http://stackoverflow.com/a/39987117/2336654) — piRSquared, Jan 05 '17 at 09:50
Yes it does, thank you. I just have to find away of setting x range size and hiding the second y-axis... Thank you! — Matt, Jan 05 '17 at 09:54
Why don't you use [`seaborn`](http://seaborn.pydata.org/tutorial/distributions.html#plotting-univariate-distributions)? — IanS, Jan 05 '17 at 09:59
No I try this: ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv)) clean.v.plot(kind='kde', ax=ax, secondary_y=True) But the range part doesn't work, and ther's still the left y-axis problem — Matt, Jan 05 '17 at 10:02
@Matt without the data it's hard to say, but yes, seaborn is meant to make difficult things easy ;) — IanS, Jan 05 '17 at 10:04
Additional answer: in seaborn you can change density smoothing with the parameter `bw`. Just sayin' — IanS, Jan 05 '17 at 10:17
@IanS I'm looking at your seaborn lik, it may be a very good option too. Thanks! — Matt, Jan 05 '17 at 10:18
Seaborn has a top-level function that does exactly this: http://seaborn.pydata.org/examples/distplot_options.html — Paul H, Jan 06 '17 at 17:36

score 5 · Answer 1 · answered Jan 05 '17 at 10:12

5

Based on your code, this should work:

ax = clean.v.plot(kind='hist', bins=40, normed=True)
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
ax.set(xlim=[0, Maxv])

You might not even need the secondary_y anymore.

answered Jan 05 '17 at 10:12

IanS

15,771
9
60
84

Very clean. I've actually got rid of the `secondary_y` . The only thing is that I've lost the real count (from hist) in y, which is now normalized, but I guess that's fine too. – Matt Jan 05 '17 at 10:22

score 1 · Accepted Answer · answered Jan 05 '17 at 10:05

1

No I try this:

ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv))
clean.v.plot(kind='kde', ax=ax, secondary_y=True)

But the range part doesn't work, and ther's still the left y-axis problem

answered Jan 05 '17 at 10:05

Matt

763
1
7
25

1

Try setting the range after plotting: `ax.set(xlim=[0, Maxv])` – IanS Jan 05 '17 at 10:08
For the left y-axis see [this answer](http://stackoverflow.com/a/17877159/5276797). – IanS Jan 05 '17 at 10:11
@IanS: thank you very much, it works for the range. :) I haven't succeeded with y-axis, though, but I guess it is less important. – Matt Jan 05 '17 at 10:15
Yes it did (I had answered to your answer :) ), but as I've said, I've lost the real histogram count of objects in y. – Matt Jan 05 '17 at 10:31

score 1 · Answer 3 · answered May 18 '20 at 03:13

1

Seaborn makes this easy

import seaborn as sns
sns.distplot(df['numeric_column'],bins=25)

answered May 18 '20 at 03:13

nanogoats

150
1
8

Very true indeed. I was very beginner when I asked this question. Seaborn is very useful! :) – Matt May 19 '20 at 12:36

Superimposition of histogram and density in Pandas/Matplotlib in Python

3 Answers3