13

I would like to make beautiful scatter plots with histograms above and right of the scatter plot, as it is possible in seaborn with jointplot:

seaborn jointplot

I am looking for suggestions on how to achieve this. In fact I am having some troubles in installing pandas, and also I do not need the entire seaborn module

simona
  • 2,009
  • 6
  • 29
  • 41
  • 2
    To be clear, your question is how to implement `sns.jointplot` in vanilla matplotlib? – wflynny May 03 '16 at 15:32
  • more or less. my question is how to place another box above a scatter plot, so I can draw an histogram there – simona May 03 '16 at 15:34
  • 1
    Check out [`matplotlib.gridspec.GridSpec`](http://matplotlib.org/users/gridspec.html#gridspec-with-varying-cell-sizes), specifically the example at the bottom. Without gridspec, you can follow this [clear example](http://matplotlib.org/examples/pylab_examples/scatter_hist.html) – wflynny May 03 '16 at 15:36
  • 1
    Further, here's a similar example on stackoverflow: https://stackoverflow.com/questions/20525983/matplotlib-imshow-a-2d-array-with-plots-of-its-marginal-densities – wflynny May 03 '16 at 15:38

3 Answers3

23

I encountered the same problem today. Additionally I wanted a CDF for the marginals.

enter image description here

Code:

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np

x = np.random.beta(2,5,size=int(1e4))
y = np.random.randn(int(1e4))

fig = plt.figure(figsize=(8,8))
gs = gridspec.GridSpec(3, 3)
ax_main = plt.subplot(gs[1:3, :2])
ax_xDist = plt.subplot(gs[0, :2],sharex=ax_main)
ax_yDist = plt.subplot(gs[1:3, 2],sharey=ax_main)
    
ax_main.scatter(x,y,marker='.')
ax_main.set(xlabel="x data", ylabel="y data")

ax_xDist.hist(x,bins=100,align='mid')
ax_xDist.set(ylabel='count')
ax_xCumDist = ax_xDist.twinx()
ax_xCumDist.hist(x,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid')
ax_xCumDist.tick_params('y', colors='r')
ax_xCumDist.set_ylabel('cumulative',color='r')

ax_yDist.hist(y,bins=100,orientation='horizontal',align='mid')
ax_yDist.set(xlabel='count')
ax_yCumDist = ax_yDist.twiny()
ax_yCumDist.hist(y,bins=100,cumulative=True,histtype='step',density=True,color='r',align='mid',orientation='horizontal')
ax_yCumDist.tick_params('x', colors='r')
ax_yCumDist.set_xlabel('cumulative',color='r')

plt.show()

Hope it helps the next person searching for scatter-plot with marginal distribution.

BiGYaN
  • 6,974
  • 5
  • 30
  • 43
  • Your pic is beautiful, +1, but the code returns an error: `AttributeError: 'Polygon' object has no property 'normed'`. Please correct your solution or tell me what I'm doing wrong. – Leo Apr 21 '20 at 21:15
  • 1
    Figured it out: replace `normed=True` with `density=True`. – Leo Apr 27 '20 at 17:56
13

Here's an example of how to do it, using gridspec.GridSpec:

import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
import numpy as np

x = np.random.rand(50)
y = np.random.rand(50)

fig = plt.figure()

gs = GridSpec(4,4)

ax_joint = fig.add_subplot(gs[1:4,0:3])
ax_marg_x = fig.add_subplot(gs[0,0:3])
ax_marg_y = fig.add_subplot(gs[1:4,3])

ax_joint.scatter(x,y)
ax_marg_x.hist(x)
ax_marg_y.hist(y,orientation="horizontal")

# Turn off tick labels on marginals
plt.setp(ax_marg_x.get_xticklabels(), visible=False)
plt.setp(ax_marg_y.get_yticklabels(), visible=False)

# Set labels on joint
ax_joint.set_xlabel('Joint x label')
ax_joint.set_ylabel('Joint y label')

# Set labels on marginals
ax_marg_y.set_xlabel('Marginal x label')
ax_marg_x.set_ylabel('Marginal y label')
plt.show()

enter image description here

tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • nice, but how do I remove ticks only from the histograms (without suppressing axes), and how do I add labels selectively? – simona May 03 '16 at 16:33
  • 1
    See here http://stackoverflow.com/questions/4209467/matplotlib-share-x-axis-but-dont-show-x-axis-tick-labels-for-both-just-one – tmdavison May 03 '16 at 16:37
  • now my labels appear on the plot [0,0] instead than [1,0]. I want ylabel on plot [0,0], xlabel on plot[1,1], and both labels on plot [1,0] – simona May 03 '16 at 16:51
2

I strongly recommend to flip the right histogram by adding these 3 lines of code to the current best answer before plt.show() :

ax_yDist.invert_xaxis()
ax_yDist.yaxis.tick_right()
ax_yCumDist.invert_xaxis()

after flipping the right histogram

The advantage is that any person who is visualizing it can compare easily the two histograms just by moving and rotating clockwise the right histogram on their mind.

On contrast, in the plot of the question and in all other answers, if you want to compare the two histograms, your first reaction is to rotate the right histogram counterclockwise, which leads to wrong conclusions because the y axis gets inverted. Indeed, the right CDF of the current best answer looks decreasing at first sight:

before flipping the right histogram

Carlos Pinzón
  • 1,286
  • 2
  • 15
  • 24