Plotting multiple overlapped histogram with pandas

Question

I have two different dataframes with 19 variables each and I'm plotting a multiple plot with the histograms of each variable like this:

fig, ax = plt.subplots(figsize=(19,10), dpi=50)
dataframe1.hist(ax=ax, layout=(3,7), alpha=0.5)

fig, ax = plt.subplots(figsize=(19,10), dpi=50)
dataframe2.hist(ax=ax, layout=(3,7), alpha=0.5)

This produce two images with 19 histograms inside. What I want to try is to plot only one image with the shared histograms in the same subplot.

I tried this:

fig, ax = plt.subplots(figsize=(19,10), dpi=50)
dataframe1.hist(ax=ax, layout=(3,7), alpha=0.5, label='x')
dataframe2.hist(ax=ax, layout=(3,7), alpha=0.5, label='y', color='red')

But its only painting the last one. This is a similar example: Plot two histograms at the same time with matplotlib but how could I apply it two my 19 subplots?

Any ideas will be welcomed, thanks in advance!

P.S: I'm currently using Jupyter Notebooks with the %matplotlib notebook option

I am understanding correctly that you want to show _nineteen_ histograms on the same set of axes? — asongtoruin, Mar 28 '19 at 11:40
I just added one of the images. I want that each subplots have 2 histograms instead of 1. Thanks for trying to understand! — Sergiodiaz53, Mar 28 '19 at 11:54

Thomas Kühn · Answer 1 · 2019-03-28T17:41:05.520

10

Your problem is that you create only one Axes object in your plt.subplots call, when you actually need 21 (3x7). As the amount of subplots provided does not match the amount of subplots requested, pandas creates new subplots. Because this happens twice, you only see the second set of histograms.

You can leave out the call to subplots altogether and let pandas do all the work. The call to hist returns all the subplots needed and this can then be used in the second call to hist.

EDIT:

I realised that, if the amount of desired plots is not actually equal to the amount of grid cells (in this case 3x9=21), you must pass exactly the amount of subplots that you actually want to plot on (in this case 19). However, the call to df.hist returns a subplot for each grid cell (i.e. 21) and apparently hides the unused ones. Hence you have to pass only a subset of all returned subplots to the second call to hist. This is easiest done by converting the 2d array of subplots into a 1d array and then slicing this array, for instance with `axes.ravel()[:19]. I edited the code accordingly:

import numpy as np
from matplotlib import pyplot as plt
import pandas as pd

length=19

loc = np.random.randint(0,50,size=length)
scale = np.random.rand(length)*10
dist = np.random.normal(loc=loc, scale=scale, size=(100,length))
df1 = pd.DataFrame(data=list(dist))


axes = df1.hist(layout=(3,7), alpha=0.5, label='x')

loc = np.random.randint(0,50,size=length)
scale = np.random.rand(length)*10
dist = np.random.normal(loc=loc, scale=scale, size=(100,length))
df2 = pd.DataFrame(data=list(dist))

df2.hist(ax=axes.ravel()[:length], layout=(3,7), alpha=0.5, label='x',color='r')

plt.show()

This produces output like this:

edited Mar 28 '19 at 17:41

answered Mar 28 '19 at 12:16

Thomas Kühn

9,412
3
47
63

I think you mean `ax=axes` rather than `ax=res` – asongtoruin Mar 28 '19 at 14:33
@asongtoruin You are right. Thanks for the help, I'll fix it in the code. – Thomas Kühn Mar 28 '19 at 14:42
Thanks! This is exactly what I need, but with the x-labels in 45 or 90 degrees. I've tried things like `plt.set_xticklabels(rotation=90)` but no success. Do you know how to make this change? – Bruno Ambrozio May 13 '20 at 17:31
@BrunoAmbrozio `pandas.hist` has inbuilt keywords for this. For instance `xrot=45` rotates all xlabels 45 degrees counter-clockwise. Note that rotating the tick labels may make them overlap with the neighbouring subplots, so you might have to add an additional `plt.gcf().tight_layout()` at the end of the script (but before `plt.show()`). – Thomas Kühn May 14 '20 at 06:09
@BrunoAmbrozio if you don't want to use `pandas.hist` functionality, you have to set the rotation angle for each subplot separately. See for instance [this post](https://stackoverflow.com/a/56139690/2454357) (the part titled 'Object-Oriented') how to do this. – Thomas Kühn May 14 '20 at 06:11
In `df2.hist` the `layout` argument is not necessary. Also, in `df1.hist`, the argument `sharey=True` could be added seeing as this improves readability (fewer labels) and comparability for cases like this one where all the variables have the same (or a similar) number of values. – Patrick FitzGerald Dec 31 '20 at 09:01

asongtoruin · Answer 2 · 2019-03-28T15:01:05.750

When you call subplots, you can specify the number of rows and columns that you want. In your case, you want 3 rows and 7 columns. However, .plot will be annoyed at there being 21 axes but only 19 to plot from your dataframe. So instead, we'll flatten the axes into a list and convert to a list, which will allow us to remove the last two from both the figure and the set of axes simultaneously through .pop()

fig, axes = plt.subplots(figsize=(19,10), dpi=50, nrows=3, ncols=7)
flat_axes = list(axes.reshape(-1))
fig.delaxes(flat_axes.pop(-1))
fig.delaxes(flat_axes.pop(-1))

dataframe1.hist(ax=flat_axes, alpha=0.5, label='x')
dataframe2.hist(ax=flat_axes, alpha=0.5, label='x',color='r')

Very neat but hard to interpret if seen in coder without your commentary? — jtlz2, Apr 26 '22 at 09:21

Plotting multiple overlapped histogram with pandas

2 Answers2