58

Does anyone know how to display the regression equation in seaborn using sns.regplot or sns.jointplot? regplot doesn't seem to have any parameter that you can be pass to display regression diagnostics, and jointplot only displays the pearson R^2, and p-value. I'm looking for a way to see the slope coefficient, standard error, and intercept as well.

Thanks

Vikram Josyula
  • 1,373
  • 4
  • 12
  • 15

2 Answers2

61

In 2015, the lead developer for seaborn replied to a feature request asking for access to the statistical values used to generate plots by saying, "It is not available, and it will not be made available."

So, unfortunately, this feature does not exist in seaborn, and seems unlikely to exist in the future.

Update: in March 2018, seaborn's lead developer reiterated his opposition to this feature. He seems... uninterested in further discussion.

millikan
  • 781
  • 5
  • 9
  • Thanks for finding this. I added the slope and intercept to the title of the plot to get around this. – Logan Sep 22 '18 at 15:42
  • 51
    Wow, the devs are so utterly wrong on this point in my professional opinion (and quite rude about it too). Not having access to the underlying statistical models makes seaborn unsuitable for serious scientific visualization. The answer to any question about using seaborn for publication quality figures is now "just don't use it". Sad, since it is pretty nice otherwise. – travc Apr 24 '19 at 17:46
  • 3
    "it is out of scope because seaborn is a library for visualization, not for statistics (statsmodels) or data munging (pandas)" ... It is _data_ we are visualising, correct? And lines created by statistics... something especially the case when **Seaborn** calls _statsmodels_ for its own estimates? – ifly6 May 07 '19 at 16:11
  • 13
    I see the maintainer's point. If the regression is only used for visualisation, he can get away with various shortcuts to e.g. make the plots draw faster. But if users start relying on his code for generating numbers for published papers, suddenly the responsibility on his shoulders increases. It's OK not to want such responsibility (though he could have been nicer in the latest reply). Also, I suspect he doesn't want to encourage the slap-dash kind of statistical testing "draw a graph, read off the R^2, claim a result if it's higher than X". – quant_dev Jun 16 '19 at 18:29
  • m, b = np.polyfit(x, y, 1) may help – msch Apr 22 '21 at 14:05
  • 1
    seaborn is an api for matplotlib, not a stats package. However you should see [seaborn: Visualizing regression models](https://seaborn.pydata.org/tutorial/regression.html#regression-tutorial), which states -**To obtain quantitative measures related to the fit of regression models, you should use [statsmodels](https://www.statsmodels.org/).*__. – Trenton McKinney Jun 21 '22 at 21:24
  • 2
    It makes total sense not to supply this kind of info since it can be calculated by the more suitable statsmodels package, whose sole purpose in life is to do such things and do them right. seaborn is - as rightly put by others - a visualization library and therefore should not be relied upon when trying to obtain a statistical model. One can use both, seaborn and statsmodels, and make them work together to one's advantage. That makes a lot of sense to me. – darlove Aug 28 '22 at 14:02
30

A late and partial answer. I had the problem of just wanting to get the data of the regression line and I found this:

When you have this plot:

f = mp.figure()
ax = f.add_subplot(1,1,1)
p = sns.regplot(x=dat.x,y=ydat,data=dat,ax=ax)

Then p has a method get_lines() which gives back a list of line2D objects. And a line2D object has methods to get the desired data:

So to get the linear regression data in this example, you just need to do this:

p.get_lines()[0].get_xdata()
p.get_lines()[0].get_ydata()

Those calls return each a numpy array of the regression line data points which you can use freely.

Using p.get_children() you get a list of the individual elements of the plot.

The path information of the confidence interval plot can be found with:

p.get_children()[1].get_paths()

It's in the form of tuples of data points.

Generally a lot can be found by using the dir() command on any Python object, it just shows everything that's in there.

Khris
  • 3,132
  • 3
  • 34
  • 54
  • 22
    This doesn't directly yield the desired equation; desired is slope and intercept of the regression line. i.e., a and b for y = ax + b. However, to get this one could use `scipy`s `stats.linregress`: `slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x=p.get_lines()[0].get_xdata(),y=p.get_lines()[0].get_ydata())` – ijoseph Oct 27 '16 at 00:16
  • 2
    The equation can easily be calculated using the (x,y) coordinates of two of the points. With two points you can calculate a and with that b. – Khris Oct 27 '16 at 06:32
  • 26
    @Khris ok but how weird it is that there's a piece of software that computes a regression, gives you the correlation coefficient and p-value of the resulting model, and does not provide the model itself? It would be great if seaborn authors could add this feature, – famargar Jul 31 '17 at 15:02
  • How to get the coefficients of the equation "a*x + b"? – Emanuel Fontelles May 10 '18 at 19:18