13

I have two pandas dataframes I would like to plot in the same seaborn jointplot. It looks something like this (commands are don in an IPython shell; ipython --pylab):

import pandas as pd
import seaborn as sns
iris = sns.load_dataset('iris')
df = pd.read_csv('my_dataset.csv')
g = sns.jointplot('sepal_length', 'sepal_width', iris)

The keys in the two dataframes are identical.
How do I plot my values in the same plot (different color of course)? And even more detailed: How do I plot both dataset, but only having the distribution of the first on at the top and side? I.e. only plot the dots.

denfromufa
  • 5,610
  • 13
  • 81
  • 138
  • I doubt this is the best way, but you could use the `hue` option of `pairplot` to get different colors (after first merging the datasets). http://stanford.edu/~mwaskom/software/seaborn/examples/scatterplot_matrix.html – JohnE Jul 21 '15 at 15:21
  • Btw, it is now recommended to use `%matplotlib inline` after starting ipython rather than invoking with `--pylab` – JohnE Jul 21 '15 at 15:25
  • What is the difference between `--pylab` and `%matplotlib inline`? – Daniel Thaagaard Andreasen Jul 21 '15 at 15:58
  • And for your first comment. I prefer the solution given by @jianxun-li. – Daniel Thaagaard Andreasen Jul 21 '15 at 16:05
  • 2
    The new Seaborn v0.11 release solves your problem elegantly via the hue parameter. Check my answer here: https://stackoverflow.com/a/63843331/7952162 – Manu CJ Sep 11 '20 at 09:35

4 Answers4

28

Here is one way to do it by modifying the underlying data of sns.JointGrid.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# simulate some artificial data
# ========================================
np.random.seed(0)
data1 = np.random.multivariate_normal([0,0], [[1,0.5],[0.5,1]], size=200)
data2 = np.random.multivariate_normal([0,0], [[1,-0.8],[-0.8,1]], size=100)

# both df1 and df2 have bivaraite normals, df1.size=200, df2.size=100
df1 = pd.DataFrame(data1, columns=['x1', 'y1'])
df2 = pd.DataFrame(data2, columns=['x2', 'y2'])


# plot
# ========================================   
graph = sns.jointplot(x=df1.x1, y=df1.y1, color='r')

graph.x = df2.x2
graph.y = df2.y2
graph.plot_joint(plt.scatter, marker='x', c='b', s=50)

enter image description here

Jianxun Li
  • 24,004
  • 10
  • 58
  • 76
  • 1
    @DanielThaagaardAndreasen You are most welcome. Glad that it helped. :-) – Jianxun Li Jul 21 '15 at 16:07
  • 18
    How can you show the second distribution also in the histograms on the side? – user1834164 Feb 09 '16 at 21:12
  • @JianxunLi Any tips on a slick way to add a legend? – user3659451 Jul 22 '17 at 19:30
  • @user1834164 `graph.plot_marginals(plt.hist, c='b')` – Nimrod Morag Jul 24 '19 at 08:53
  • @user1834164: Since the release of Seaborn v0.11, this is possible very easily. Check my answer here: https://stackoverflow.com/a/63843331/7952162 – Manu CJ Sep 11 '20 at 09:33
  • 1
    @NimrodMorag - The code you proposed return the error `AttributeError: 'Rectangle' object has no property 'vertical'`. @ManuCJ - Thank you for the example pointing the new `hue` parameter, however, this is not as versatile as to plot together any two unrelated data set. I still would like another answer for @user1834164 question. – Stefano Oct 05 '20 at 13:28
  • @Stefano sorry, here you go: ```graph.ax_marg_y.hist(df2.y2, color='b', orientation="horizontal"); graph.ax_marg_x.hist(df2.x2, color='b')``` – Nimrod Morag Oct 11 '20 at 07:15
5

A better solution, in my opinion, is to use the axes handles for the joint and marginal distributions that sns.joinplot returns. Using those (the names are ax_joint, ax_marg_x and ax_marg_y) is also possible to draw on the marginal distributions plots.

import seaborn as sns
import numpy as np

data1 = np.random.randn(100)
data2 = np.random.randn(100)
data3 = np.random.randn(100)
data4 = np.random.randn(100)

df1 = pd.DataFrame({'col1': data1, 'col2':data2})
df2 = pd.DataFrame({'col1': data3, 'col2':data4})

axs = sns.jointplot('col1', 'col2', data=df1)
axs.ax_joint.scatter('col1', 'col2', data=df2, c='r', marker='x')

# drawing pdf instead of histograms on the marginal axes
axs.ax_marg_x.cla()
axs.ax_marg_y.cla()
sns.distplot(df1.col1, ax=axs.ax_marg_x)
sns.distplot(df1.col2, ax=axs.ax_marg_y, vertical=True)

distplots on marginal axes

JacoSolari
  • 1,226
  • 14
  • 28
  • I am able to reproduce your code, however, I would like to plot a new histogram in the vertical marginal axis (not a `sns.distplot` plot as in your example). I tried to use the function `sns.histplot` instead but the results end up rotated 90 degrees off. How can you add and new histogram at the vertical marginal axis? – Stefano Oct 05 '20 at 13:53
  • @Stefano try adding the following 2 lines at the end of the snippet I provided in the answer: `df_new = pd.DataFrame({'col1': np.random.rand(100)}) sns.histplot(data=df_new, ax=axs.ax_marg_y, y="col1", stat="density")` – JacoSolari Oct 05 '20 at 17:30
1

It might be easier after drawing the jointplot, change to the axis on which you want to draw something and use then normal pyplot or axis based seaborn plots:

g=sns.jointplot(...)
plt.sca("axis_name")
plt.plot/plt.scatter/.../sns.kde(ax="axis_name")

The axis name is either ax_joint for the 2d-Plot or ax_marg_x or ax_marg_y for the 1d Plots on the side.

Furthermore, if you want to use the jointplot structure but plot all plots by pyplot, use the cla function, e.g. for clearing the 2d-Plot:

g.ax_joint.cla()
Chris Tang
  • 567
  • 7
  • 18
Guiste
  • 449
  • 1
  • 5
  • 16
0
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103