0

I tried the following two options for plotting my regression model's residuals:

ax = sns.jointplot(data=df,x="PS",y='residuals',ax=True,kind='resid',marginal_ticks=True,lowess=True)
ax = sns.jointplot(data=df,x="PS",y='residuals',ax=True,kind='reg',marginal_ticks=True,lowess=True)

sns.residplot gives the same output as jointplot with the kind='resid' option set.

However, they each give a different lowess output and it looks like the residuals have been rotated around the y axis at y=0 when using kind=resid. The same data was used in both cases.

The residuals are plotted in both graphs, but they are wrong in the one with kind='resid'. Could this be a bug in the resid code? Even if it was a standardisation process applied, the residuals shouldn't be getting shifted from negative to positive.

kind="resid"

kind="reg"

Hoppity81
  • 61
  • 8
  • 1
    Could you please try to make the example reproducible? Either by generating some dummy test data or using one of seaborn's example datasets? – JohanC Nov 11 '21 at 13:09
  • 1
    This question is not reproducible without **data**. This question needs a [SSCCE](http://sscce.org/). Please see [How to provide a reproducible dataframe](https://stackoverflow.com/q/52413246/7758804), then **[edit] your question**, and paste the clipboard into a code block. Always provide a [mre] **with code, data, errors, current output, and expected output, as [formatted text](https://stackoverflow.com/help/formatting)**. If relevant, plot images are okay. If you don't include an mre, it is likely the question will be downvoted, closed, and deleted. – Trenton McKinney Nov 11 '21 at 13:45
  • Why do you think the `kind='resid'` plot is wrong? How should it look like? How did you calculate the residuals? Could you please add some reproducible data? – JohanC Nov 11 '21 at 22:06

1 Answers1

2

The underlying axes-level functions are sns.regplot and sns.residplot. The example data used in the question's post seems to use y-values which are close to normal distributed around zero, which makes it rather hard to see what is happening.

The code below uses the Gentoo species of the penguins dataset.

The regplot at the left shows the original data with a regression line. The y-values are all positive, and the regression line has an upward slope.

The residplot at the right shows the data with the regression line subtracted. The y-values are positive where the original data are above the regression line, and negative below. The residplot helps to assess whether the regression (linear in this case) is a suitable fit.

The linked tutorial post and video might be helpful to better understand the residual plot.

import matplotlib.pyplot as plt
import seaborn as sns

penguins = sns.load_dataset('penguins')
penguins = penguins[penguins['species'] == 'Gentoo']
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 4), sharex=True)
sns.regplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", ax=ax1)
ax1.set_title("regplot")
sns.residplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", ax=ax2)
ax2.set_title("residplot")

plt.tight_layout()
plt.show()

sns.regplot vs sns.residplot

If your input data are already residuals, both plots seem to be very similar:

import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

penguins = sns.load_dataset('penguins').dropna()
penguins = penguins[penguins['species'] == 'Gentoo']
x = penguins['bill_length_mm']
y = penguins['bill_depth_mm']
regr = stats.linregress(x, y)
y -= regr.slope * x + regr.intercept  # calculate the residuals

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 4))
sns.regplot(x=x, y=y, lowess=True, ax=ax1)
ax1.set_title('regplot')
sns.residplot(x=x, y=y, lowess=True, ax=ax2)
ax2.set_title('residplot')

plt.tight_layout()
plt.show()

sns.regplot vs sns.residplot with residuals

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Could you clarify your question, for example using the residual data from the updated answer? – JohanC Nov 11 '21 at 16:36
  • 1
    This is a perfect answer to the question as I understand it! – mwaskom Nov 12 '21 at 00:22
  • Thanks @JohanC for your great answer. I realised I must have been using the equation wrongly. I had calculated the residuals before plotting with each of the reg and resid options of jointplot, I think that using the resid option meant that a further calculation was being done to calculate the residuals between the x-axis values and the already calculated residuals. I'll test this on my data on Monday. Sorry for the confusion, and thanks for your help. I've accepted your answer – Hoppity81 Nov 12 '21 at 17:15