0

I am trying to implement least squares:

I have: $y=\theta\omega$

The least square solution is \omega=(\theta^{T}\theta)^{-1}\theta^{T}y

I tryied:

import numpy as np    
def least_squares1(y, tx):
        """calculate the least squares solution."""
        w = np.dot(np.linalg.inv(np.dot(tx.T,tx)), np.dot(tx.T,y))

        return w

The problem is that this method becomes quickly unstable (for small problems its okay)

I realized that, when I compared the result to this least square calculation:

 import numpy as np 
 def least_squares2(y, tx):
        """calculate the least squares solution."""
        a = tx.T.dot(tx)
        b = tx.T.dot(y)
        return np.linalg.solve(a, b)

Compare both methods: I tried to fit data with a polynomial of degree 12 [1, x,x^2,x^3,x^4...,x^12]

First method:

enter image description here

Second method:

enter image description here

Do you know why the first method diverges for large polynomials ?

P.S. I only added "import numpy as np" for your convinience, if you want to test the functions.

ali_m
  • 71,714
  • 23
  • 223
  • 298
  • 1
    Have you seen [numpy.linalg.lstsq](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.linalg.lstsq.html) ? – Jon Clements Oct 11 '17 at 13:35
  • @JonClements Thansk... I rephrased my question –  Oct 11 '17 at 13:53
  • 1
    My guess is because the `inv` function doesn't take into account the fact that the matrix you are inverting is always Hermitian, and just finds the inverse using an algorithm that works for general matrices. Aside from the built in `lstsq` function, your best bet would be using the Cholesky decomposition instead (`cholesky`). – Kevin Oct 11 '17 at 13:56
  • Possible duplicate of [Why does numpy.linalg.solve() offer more precise matrix inversions than numpy.linalg.inv()?](https://stackoverflow.com/questions/31256252/why-does-numpy-linalg-solve-offer-more-precise-matrix-inversions-than-numpy-li) – MB-F Oct 11 '17 at 14:41
  • @Kevin Thanks for your guess. This might in fact be the case. I will try to implement cholesky –  Oct 11 '17 at 15:20

2 Answers2

3

There are three points here:

One is that it is generally better (faster, more accurate) to solve linear equations rather than to compute inverses.

The second is that it's always a good idea to use what you know about a system of equations (e.g. that the coefficient matrix is positive definite) when computing a solution, in this case you should use numpy.linalg.lstsq

The third is more specifically about polynomials. When using monomials as a basis, you can end up with a very poorly conditioned coefficient matrix, and this will mean that numerical errors tend to be large. This is because, for example, the vectors x->pow(x,11) and x->pow(x,12) are very nearly parallel. You would get a more accurate fit, and be able to use higher degrees, if you were to use a basis of orthogonal polynomials, for example https://en.wikipedia.org/wiki/Chebyshev_polynomials or https://en.wikipedia.org/wiki/Legendre_polynomials

dmuir
  • 4,211
  • 2
  • 14
  • 12
  • Was about to write a similar answer. Yours is better. However, it is not clear to me what exactley you try to express in your first point. I am not to familiar wiht this. I thought, that when solving a set of linear equations (matrixes), this implies computing inverses. Hence, I don't see what kind of operation you would use to solve linear equations without computing any inverse ? – henry Oct 11 '17 at 15:28
  • 1
    @DoHe For example if P is psd, it has a cholesky factorisation P = L*L' with L lower triangular. Then to solve P*x = y for x, you first solve L*z = y for z and then solve L'*x = z for x (These need use no extra memory as the solutions can be done in place) Solving triangular systems is easy. When you compute the inverse, you are effectively solving n equations, P*f1 = e1, P*f2 = e2 etc and then when you apply the inverse to the rhs you are combining these solutions. So there are two steps and the numerical errors will accumulate across these two steps. – dmuir Oct 11 '17 at 15:43
  • Thank you very much for your answer !! :) –  Oct 11 '17 at 19:30
0

I am going to improve on what was said before. I answered this yesterday. The problem with higher order polynomials is something called Runge's phenomena. The reason why the person resorted orthogonal polynomials which are known as Hermite polynomials is that they attempt to get rid of the Gibbs phenomenon which is an adverse oscillatory effect when Fourier series methods are applied to non-periodic signals.

You can sometimes improve under the conditioning be resorting to regularizing methods if the matrix is low rank as I did in the other post. Other parts may be due to smoothness properties of the vector.