1

I was trying to solve a linear system Xc=y that was square. The methods I know to solve this are:

  1. using inverse c=<X^-1,y>
  2. using Gaussian elimination
  3. using the pseudo-inverse

It seems as far as I can tell that these don't match what I thought would be the ground truth.

  1. First generate the truth parameters by fitting a polynomial of degree 30 to a cosine with frequency 5. So I have y_truth = X*c_truth.
  2. Then I check if the above three methods match the truth

I tried it but the methods don't seem to match and I don't see why that should be the case.

I produced fully runnable reproducible code:

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

## some parameters
degree_target = 25
N_train = degree_target+1
lb,ub = -2000,2000
x = np.linspace(lb,ub,N_train)
## generate target polynomial model
freq_cos = 5
y_cos = np.cos(2*np.pi*freq_cos*x)
c_polyfit = np.polyfit(x,y_cos,degree_target)[::-1] ## needs to me reverse to get highest power last
## generate kernel matrix
poly_feat = PolynomialFeatures(degree=degree_target)
K = poly_feat.fit_transform(x.reshape(N_train,1)) # generates degree 0 first
## get target samples of the function
y = np.dot(K,c_polyfit)
## get pinv approximation of c_polyfit
c_pinv = np.dot( np.linalg.pinv(K), y)
## get Gaussian-Elminiation approximation of c_polyfit
c_GE = np.linalg.solve(K,y)
## get inverse matrix approximation of c_polyfit
i = np.linalg.inv(K)
c_mdl_i = np.dot(i,y)
## check rank to see if its truly invertible
print('rank(K) = {}'.format( np.linalg.matrix_rank(K) ))
## comapre parameters
print('--c_polyfit')
print('||c_polyfit-c_GE||^2 = {}'.format( np.linalg.norm(c_polyfit-c_GE) ))
print('||c_polyfit-c_pinv||^2 = {}'.format( np.linalg.norm(c_polyfit-c_pinv) ))
print('||c_polyfit-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_polyfit-c_mdl_i) ))
print('||c_polyfit-c_polyfit||^2 = {}'.format( np.linalg.norm(c_polyfit-c_polyfit) ))
##
print('--c_GE')
print('||c_GE-c_GE||^2 = {}'.format( np.linalg.norm(c_GE-c_GE) ))
print('||c_GE-c_pinv||^2 = {}'.format( np.linalg.norm(c_GE-c_pinv) ))
print('||c_GE-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_GE-c_mdl_i) ))
print('||c_GE-c_polyfit||^2 = {}'.format( np.linalg.norm(c_GE-c_polyfit) ))
##
print('--c_pinv')
print('||c_pinv-c_GE||^2 = {}'.format( np.linalg.norm(c_pinv-c_GE) ))
print('||c_pinv-c_pinv||^2 = {}'.format( np.linalg.norm(c_pinv-c_pinv) ))
print('||c_pinv-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_pinv-c_mdl_i) ))
print('||c_pinv-c_polyfit||^2 = {}'.format( np.linalg.norm(c_pinv-c_polyfit) ))
##
print('--c_mdl_i')
print('||c_mdl_i-c_GE||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_GE) ))
print('||c_mdl_i-c_pinv||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_pinv) ))
print('||c_mdl_i-c_mdl_i||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_mdl_i) ))
print('||c_mdl_i-c_polyfit||^2 = {}'.format( np.linalg.norm(c_mdl_i-c_polyfit) ))

and I get the result:

rank(K) = 4
--c_polyfit
||c_polyfit-c_GE||^2 = 4.44089220304006e-16
||c_polyfit-c_pinv||^2 = 1.000000000000001
||c_polyfit-c_mdl_i||^2 = 1.1316233165135605e-06
||c_polyfit-c_polyfit||^2 = 0.0
--c_GE
||c_GE-c_GE||^2 = 0.0
||c_GE-c_pinv||^2 = 1.0000000000000007
||c_GE-c_mdl_i||^2 = 1.1316233160694804e-06
||c_GE-c_polyfit||^2 = 4.44089220304006e-16
--c_pinv
||c_pinv-c_GE||^2 = 1.0000000000000007
||c_pinv-c_pinv||^2 = 0.0
||c_pinv-c_mdl_i||^2 = 0.9999988683985006
||c_pinv-c_polyfit||^2 = 1.000000000000001
--c_mdl_i
||c_mdl_i-c_GE||^2 = 1.1316233160694804e-06
||c_mdl_i-c_pinv||^2 = 0.9999988683985006
||c_mdl_i-c_mdl_i||^2 = 0.0
||c_mdl_i-c_polyfit||^2 = 1.1316233165135605e-06

Why is it? Is it a machine precision thing? Or is it because the error accumulates (a lot) when degree is large (greater than 1)? Honestly I don't know but all those hypothesis seem silly to me. If anyone can spot my mistake, feel free to point it out. Otherwise, I might not understand linear algebra or something...which is a lot more worrying.

Also if I can get suggestions for this to work, it would be highly appreciated. Do I:

  1. increase the size of the interval to no be less than 1 (in magnitude)?
  2. what is the largest polynomial size degree I may use?
  3. different language...? Or increase the precision?

any suggestions are appreciated!

Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
  • as a fun side question, if I were to train my polynomial model with GD or SGD, would I have the same numerical errors if I restrict my model to be in `[-1,1]` or is this just special to (pseudo) inverting a matrix? – Charlie Parker Oct 21 '17 at 21:18
  • Another follow up question. I've been suggested multiple times to just change the interval to say, `[-5,5]`. However, now my question is, why doesn't the opposite issue arise and then instead of `1.0+eps = 1.0` that we get `1.0+big_number = NaN`? if `big_number = number^30`? Like why do numerical issue seem to be more sensitive when raising to the power of 30 a number with magnitude less than 1? – Charlie Parker Nov 01 '17 at 17:54

1 Answers1

2

The issue is floating point accuracy. You're raising numbers between zero and one to the 30th power, then adding them together. If you were doing this with infinite precision arithmetic, the methods would recover the inputs. With floating point arithmetic, precision loss means you can't.

jakevdp
  • 77,104
  • 11
  • 125
  • 160
  • Is there a way to make these methods yield the same answer? Maybe changing the range of the inputs? Or maybe having lower degree of polynomials? – Charlie Parker Oct 21 '17 at 05:05
  • I've been playing around with it and it seems particularly bad for the pseudo-inverse which is the one I cared about most... – Charlie Parker Oct 21 '17 at 05:09
  • also, is this a language dependent issue? would it still occur in Matlab for example? – Charlie Parker Oct 21 '17 at 05:24
  • 1
    I don't know of a way to easily modify high order polynomial regression to be numerically stable, either in matlab, python, or another language. What you could do is to use better-behaved basis functions, e.g a truncated Fourier series. – jakevdp Oct 21 '17 at 14:47
  • if I were to train my polynomial model with GD or SGD, would I have the same numerical errors if I restrict my model to be in `[-1,1]` or is this just special to (pseudo) inverting a matrix? – Charlie Parker Oct 21 '17 at 21:18
  • is there a way to know what is the largest degree of polynomials I can try without getting into these issues for a given interval size? – Charlie Parker Oct 22 '17 at 20:48
  • If you're using float-64, a good rule of thumb is that you shouldn't add or subtract numbers that differ by more than a factor of 10^16. For float-32, it's no more than 10^8. – jakevdp Oct 22 '17 at 22:32
  • Another follow up question. I've been suggested multiple times to just change the interval to say, `[-5,5]`. However, now my question is, why doesn't the opposite issue arise and then instead of `1.0+eps = 1.0` that we get `1.0+big_number = NaN`? if `big_number = number^30`? Like why do numerical issue seem to be more sensitive when raising to the power of 30 a number with magnitude less than 1? – Charlie Parker Nov 01 '17 at 17:54
  • why is a truncated Fourier series better behaved than a polynomial one? – Charlie Parker Jan 10 '18 at 17:50
  • would changing the basis to orthogonal polyonomials like Hermite polynomials, have the same issue as the ones I am using now? If not, do you know why not? – Charlie Parker Jan 10 '18 at 17:51