24

Is there a idiomatic way to plot the histogram of a feature for two classes? In pandas, I basically want

df.feature[df.class == 0].hist()
df.feature[df.class == 1].hist()

To be in the same plot. I could do

df.feature.hist(by=df.class)

but that gives me two separate plots.

This seems to be a common task so I would imagine there to be an idiomatic way to do this. Of course I could manipulate the histograms manually to fit next to each other but usually pandas does that quite nicely.

Basically I want this matplotlib example in one line of pandas: http://matplotlib.org/examples/pylab_examples/barchart_demo.html

I thought I was missing something, but maybe it is not possible (yet).

Andreas Mueller
  • 27,470
  • 8
  • 62
  • 74

1 Answers1

30

How about df.groupby("class").feature.hist()? To see overlapping distributions you'll probably need to pass alpha=0.4 to hist(). Alternatively, I'd be tempted to use a kernel density estimate instead of a histogram with df.groupby("class").feature.plot(kind='kde').

As an example, I plotted the iris dataset's classes using:

iris.groupby("Name").PetalWidth.plot(kind='kde', ax=axs[1])
iris.groupby("Name").PetalWidth.hist(alpha=0.4, ax=axs[0])

enter image description here

jmz
  • 4,138
  • 28
  • 27
  • 3
    What is the `axs` list supposed to refer to? – willwest Oct 09 '14 at 16:35
  • 3
    A list of matplotlib axes generated with something like `fig, axs = plt.subplots(ncols=2)`. See the docs: http://matplotlib.org/users/recipes.html#easily-creating-subplots – jmz Oct 09 '14 at 16:40
  • To get consistent spacing for the histogram, one needs to specify `range=[df. PetalWidth.min(), df. PetalWidth.max()]`. – Piotr Migdal Dec 18 '14 at 16:36
  • Thanks! How can we get a legend for the colors in the boxplot? – nealmcb Nov 08 '18 at 14:53
  • Add one to the axis object. For example, `axs[1].legend()` is probably what was used in the example there. – jmz Nov 09 '18 at 16:34
  • 2
    `axs[1].legend()` complains about no handles if used on the histogram. For the KDE plot you can pass `legend=True` – Danielle Madeley Jun 15 '20 at 05:46