2

I want to create a regplot with a linear regression in Seaborn and scale both axes equally by log, such that the regression stays a straight line.

An example:

import matplotlib.pyplot as plt
import seaborn as sns

some_x=[0,1,2,3,4,5,6,7]
some_y=[3,5,4,7,7,9,9,10]

ax = sns.regplot(x=some_x, y=some_y, order=1)
plt.ylim(0, 12)
plt.xlim(0, 12)
plt.show()

What I get:

Linear regression

If I scale the x and y axis by log, I would expect the regression to stay a straight line. What I tried:

import matplotlib.pyplot as plt
import seaborn as sns

some_x=[0,1,2,3,4,5,6,7]
some_y=[3,5,4,7,7,9,9,10]

ax = sns.regplot(x=some_x, y=some_y, order=1)
ax.set_yscale('log')
ax.set_xscale('log')
plt.ylim(0, 12)
plt.xlim(0, 12)
plt.show()

How it looks:

Linear regression becomes a curve

Marcel
  • 119
  • 1
  • 2
  • 7
  • It would seem your x and y limits aren't the same. That's why it doesn't look linear. Might have to do with `0` not being representable in log scale. – busybear Dec 29 '18 at 19:40
  • As @busybear said, you can't use `0` on a log scale. You could use something small, like `1e-3`, instead. But even in that case, the plotted regression isn't a straight line. I'm not familiar with Seaborn's regression tools, but might it be doing a nonlinear regression? Or plotting something other than the actual linear fit? – bnaecker Dec 29 '18 at 19:43
  • Thanks. If this is is the reason, it probably can be solved by getting the linear equation of the regression via statmodels and just plotting the line, avoiding Seaborn's regression. Will try that later. – Marcel Dec 29 '18 at 19:47
  • You have a strange line on a linear scale so when you transfer it to a log scale, you should expect a strange line. I think maybe it is more reasonable to transfer for your data to log by `np.log()`. but you need to deal with 0 indeed. – steven Dec 29 '18 at 19:55

1 Answers1

2

The problem is that you are fitting to your data on a regular scale but later you are transforming the axes to log scale. So linear fit will no longer be linear on a log scale.

What you need instead is to transform your data to log scale (base 10) and then perform a linear regression. Your data is currently a list. It would be easy to transform your data to log scale if you convert your list to NumPy array because then you can make use of vectorised operation.

Caution: One of your x-entry is 0 for which log is not defined. You will encounter a warning there.

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

some_x=np.array([0,1,2,3,4,5,6,7])
some_y=np.array([3,5,4,7,7,9,9,10])

ax = sns.regplot(x=np.log10(some_x), y=np.log10(some_y), order=1)

Solution using NumPy polyfit where you exclude x=0 data point from the fit

import matplotlib.pyplot as plt
import numpy as np

some_x=np.log10(np.array([0,1,2,3,4,5,6,7]))
some_y=np.log10(np.array([3,5,4,7,7,9,9,10]))

fit = np.poly1d(np.polyfit(some_x[1:], some_y[1:], 1))

plt.plot(some_x, some_y, 'ko')
plt.plot(some_x, fit(some_x), '-k')

enter image description here

Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • Thanks for solving my problem (once again), Bazingaa. I didn't think about the 0 not being defined on logarithms. What I don't get is.. what is the advantage of the second solution over using the first one and also excluding the zero value? Also, since I want to use this on my thesis, is it "reasonable" to simply add a very small value (like 10^⁻10) to each value to avoid the problem with the zero or would this be considered unprecise? – Marcel Dec 29 '18 at 20:22
  • 1
    @Marcel: The sns regplot didn't generate the linear line when 0 was included. There is no advantage of second solution. It is just an alternative. If you are using it for thesis, you can clarify that in order to obtain a numerical fit, we had to exclude 0 and choose an equally small number as 10**-4 as compared to the other data points. – Sheldore Dec 29 '18 at 20:25
  • So if I get you correctly, you would rather replace each 0 by e.g. 10**-4 than shifting the whole plot by adding +10**-4 to each value in the plot? – Marcel Dec 29 '18 at 20:40
  • 1
    Either you can just replace 0 by a small number or add a small number as an offset to all data points. Afterall you are interested in the trend – Sheldore Dec 29 '18 at 20:44