2

I am currently plotting some numerical relationships between 2 variables with the sns.scatterplot functionality, and would like to add the label to the scatterplot that shows the correlation coefficient between the 2 variables as an annotation on my plots.

How would I do that in python/seaborn?

I tried looking at the sns page here https://seaborn.pydata.org/generated/seaborn.scatterplot.html for this example: sns.scatterplot(data=tips, x="total_bill", y="tip") but was unable to find any help? any luck here? thanks !

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • 1
    Seaborn doesn't do this. You need to calculate the correlation coefficient yourself (via numpy, scipy, pandas,...) and then add it as text somewhere in the plot (or e.g. in the title). – JohanC Jan 18 '22 at 17:13
  • 1
    @JohanC thanks, something like using `np.corrcoef` and `plt.annotate`? – russianblyatsuka Jan 18 '22 at 17:17

2 Answers2

3

This could help:

# import the scipy library
import scipy as sp
# call the seaborn scatterplot function per usual
sns.scatterplot(data=df, x=df['col1'], y=df['col2'], hue='col3')

# define titles and axes labels
plt.title('Title')
plt.xlabel('x-axis label')
plt.ylabel('y-axis label')

# call the scipy function for pearson correlation
r, p = sp.stats.pearsonr(x=df['col1'] y=df['col2'])
# annotate the pearson correlation coefficient text to 2 decimal places
plt.text(.05, .8, 'r={:.2f}'.format(r), transform=ax.transAxes)

plt.show()
mdoc-2011
  • 2,747
  • 4
  • 21
  • 43
1

Runnable example based on Leon Shpaner's answer:

import scipy as sp
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
sns.set_theme(style="ticks")

x_data = [1,2,3,4,5,6,7,8,9]
y_data = [1,3,2,4,5,6,7,9,8]
sns.scatterplot(x=x_data, y=y_data)

plt.title('Title')
plt.xlabel('x-axis label')
plt.ylabel('y-axis label')

r, p = sp.stats.pearsonr(x=x_data, y=y_data)
ax = plt.gca() # Get a matplotlib's axes instance
plt.text(.05, .8, "Pearson's r ={:.2f}".format(r), transform=ax.transAxes)
plt.savefig('Scatterplot with Pearson r.png', bbox_inches='tight', dpi=300)
plt.close()

outputs:

enter image description here


If one wants to also add the correlation line:

import scipy as sp
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
matplotlib.use('Agg')
sns.set_theme(style="ticks")

x_data = [1,2,3,4,5,6,7,8,9]
y_data = [1,3,2,4,5,6,7,9,8]
sns.scatterplot(x=x_data, y=y_data)

plt.title('Title')
plt.xlabel('x-axis label')
plt.ylabel('y-axis label')

r, p = sp.stats.pearsonr(x=x_data, y=y_data)
ax = plt.gca() # Get a matplotlib's axes instance
plt.text(.05, .8, "Pearson's r ={:.2f}".format(r), transform=ax.transAxes)

# The following code block adds the correlation line:
import numpy as np
m, b = np.polyfit(x_data, y_data, 1)
X_plot = np.linspace(ax.get_xlim()[0],ax.get_xlim()[1],100)
plt.plot(X_plot, m*X_plot + b, '-')

plt.savefig('Scatterplot with Pearson r.png', bbox_inches='tight', dpi=300)
plt.close()

outputs:

enter image description here


Related, still in Python but without seaborn: How to overplot a line on a scatter plot in python?

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501