5

I've got a Pandas dataframe named clean which contains a column v for which I would like to draw a histogram and superimpose a density plot. I know I can plot one under the other this way:

import pandas as pd
import matplotlib.pyplot as plt

Maxv=200

plt.subplot(211)
plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")

plt.subplot(212)
ax=clean['v'].plot(kind='density')
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)

enter image description here

But when I try to superimpose, y scales doesn't match (and I loose y axis ticks and labels):

yhist, xhist, _hist = plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")

ax=clean['v'].plot(kind='density') #I would like to insert here a normalization to max(yhist)/max(ax)
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)

enter image description here

Some hint? (Additional question: how can I change the width of density smoothing?)

JohnE
  • 29,156
  • 8
  • 79
  • 109
Matt
  • 763
  • 1
  • 7
  • 25
  • 1
    [this answer should help](http://stackoverflow.com/a/39987117/2336654) – piRSquared Jan 05 '17 at 09:50
  • Yes it does, thank you. I just have to find away of setting x range size and hiding the second y-axis... Thank you! – Matt Jan 05 '17 at 09:54
  • Why don't you use [`seaborn`](http://seaborn.pydata.org/tutorial/distributions.html#plotting-univariate-distributions)? – IanS Jan 05 '17 at 09:59
  • Is it easier in seaborn ? – Matt Jan 05 '17 at 10:01
  • No I try this: ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv)) clean.v.plot(kind='kde', ax=ax, secondary_y=True) But the range part doesn't work, and ther's still the left y-axis problem – Matt Jan 05 '17 at 10:02
  • 1
    @Matt without the data it's hard to say, but yes, seaborn is meant to make difficult things easy ;) – IanS Jan 05 '17 at 10:04
  • Additional answer: in seaborn you can change density smoothing with the parameter `bw`. Just sayin' – IanS Jan 05 '17 at 10:17
  • 1
    @IanS I'm looking at your seaborn lik, it may be a very good option too. Thanks! – Matt Jan 05 '17 at 10:18
  • 1
    Seaborn has a top-level function that does exactly this: http://seaborn.pydata.org/examples/distplot_options.html – Paul H Jan 06 '17 at 17:36

3 Answers3

5

Based on your code, this should work:

ax = clean.v.plot(kind='hist', bins=40, normed=True)
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
ax.set(xlim=[0, Maxv])

You might not even need the secondary_y anymore.

IanS
  • 15,771
  • 9
  • 60
  • 84
  • Very clean. I've actually got rid of the `secondary_y` . The only thing is that I've lost the real count (from hist) in y, which is now normalized, but I guess that's fine too. – Matt Jan 05 '17 at 10:22
1

No I try this:

ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv))
clean.v.plot(kind='kde', ax=ax, secondary_y=True)

But the range part doesn't work, and ther's still the left y-axis problem

enter image description here

Matt
  • 763
  • 1
  • 7
  • 25
  • 1
    Try setting the range after plotting: `ax.set(xlim=[0, Maxv])` – IanS Jan 05 '17 at 10:08
  • For the left y-axis see [this answer](http://stackoverflow.com/a/17877159/5276797). – IanS Jan 05 '17 at 10:11
  • @IanS: thank you very much, it works for the range. :) I haven't succeeded with y-axis, though, but I guess it is less important. – Matt Jan 05 '17 at 10:15
  • Yes it did (I had answered to your answer :) ), but as I've said, I've lost the real histogram count of objects in y. – Matt Jan 05 '17 at 10:31
1

Seaborn makes this easy

import seaborn as sns
sns.distplot(df['numeric_column'],bins=25)

enter image description here

nanogoats
  • 150
  • 1
  • 8
  • Very true indeed. I was very beginner when I asked this question. Seaborn is very useful! :) – Matt May 19 '20 at 12:36