0

I am trying to do weighted least-square fitting, and came across numpy.linalg.lstsq. I need to fit the weighted least squares. So, the following works:

# Generate some synthetic data from the model.
N = 50
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)
#do the fitting
err = 1/yerr**2
W = np.sqrt(np.diag(err))
x = x.flatten()
y = y.flatten()
A = np.vstack([x, np.ones(len(x))]).T
xw = np.dot(W,A)
yw = np.dot(W,y)
m, b = np.linalg.lstsq(xw, yw)[0]

which gives me the best-fit slope and intercept. Now, suppose I have two datasets with same slope but different intercepts? How would I do a joint fit such that I get best-fit slope plus two intercepts. I still need to have the weighted least square version. For an unweighted case, I found that the following works:

(m,b1,b2),_,_,_ = np.linalg.lstsq(np.stack([np.concatenate((x1,x2)),
                                        np.concatenate([np.ones(len(x1)),np.zeros(len(x2))]),
                                        np.concatenate([np.zeros(len(x1)),np.ones(len(x2))])]).T, 
                              np.concatenate((y1,y2)))
titanium
  • 85
  • 9

1 Answers1

1

First of all I rewrite your first approach as it can be written clearer in my opinion like this

weights = 1 / yerr
m, b = np.linalg.lstsq(np.c_[weights * x, weights], weights * y, rcond=None)[0]

To fit 2 datasets you can stack 2 arrays but make 0 some elements of matrix.

np.random.seed(12)
N = 3
x = np.sort(10 * np.random.rand(N))
yerr = 0.1 + 0.5 * np.random.rand(N)
y = 10.0 * x + 15
y += yerr * np.random.randn(N)

M = 2
x1 = np.sort(10 * np.random.rand(M))
yerr1 = 0.1 * 0.5 * np.random.rand(M)
y1 = 10.0 * x1 + 25
y1 += yerr1 * np.random.randn(M)
#do the fitting
weights = 1 / yerr
weights1 = 1 / yerr1
first_column = np.r_[weights * x, weights1 * x1]
second_column = np.r_[weights, [0] * x1.size]
third_column = np.r_[[0] * x.size, weights1]
a = np.c_[first_column, second_column, third_column]
print(a)
# [[  4.20211437   2.72576342   0.        ]
#  [ 24.54293941   9.32075195   0.        ]
#  [ 13.22997409   1.78771428   0.        ]
#  [126.37829241   0.          26.03711851]
#  [686.96961895   0.         124.44253391]]
c = np.r_[weights * y, weights1 * y1]
print(c)
# [  83.66073785  383.70595203  159.12058215 1914.59065915 9981.85549321]
m, b1, b2 = np.linalg.lstsq(a, c, rcond=None)[0]
print(m, b1, b2)
# 10.012202998026055 14.841412336510793 24.941219918240172

EDIT

If you want different slopes and one intercept you can do it this way. Probably it is better to grasp the general idea on the one slope 2 intercepts case. Take a look to array a: you construct it from weights as well as c so now it is unweighted problem. You try to find such vector = [slope, intercept1, intercept2] that a @ vector = c (as much as possible by minimizing sum of squares of differences). By putting zeros in a we make it separable: upper part of matrix a vary slope and intercept1 and down part of a vary slope and intercept2. Similar to 2 slopes case with vector = [slope1, slope2, intercept].

first_column = np.r_[weights * x, [0] * x1.size]
second_column = np.r_[[0] * x.size, weights1 * x1]
third_column = np.r_[weights, weights1]
V. Ayrat
  • 2,499
  • 9
  • 10
  • But that would yield different intercept and slope for with and without zeros, right? As I understood it he would like to keep one fix. – Joe Jun 14 '20 at 10:10
  • @Joe I am not sure I fully understand your comment but it give one slope and 2 intercepts. When we generate `y` and `y1` we use 10, 15 and 25 as slope and 2 intercepts. And the last list of code show we get this values. – V. Ayrat Jun 14 '20 at 11:32
  • Ok, I get it now. Will delete my answer. – Joe Jun 14 '20 at 14:45
  • One follow-up question: how would the code modify if I have two datasets with different slope but same intercept? – titanium Jun 14 '20 at 18:14
  • Thanks a lot. That seems to work perfectly. One last question: Is there any way to get the standard uncertainties associated with the slope and intercept? The output seems to give the sum of residues only. – titanium Jun 14 '20 at 18:59
  • Wikipedia has equations for this https://en.wikipedia.org/wiki/Simple_linear_regression#Normality_assumption – V. Ayrat Jun 14 '20 at 19:28
  • What if we want to do it for 3 datasets.. i.e. same slope but different intercepts. Sorry If I have missed the general idea behind this. – dgarg Oct 17 '22 at 12:33