1

I'm writing a linear regression algorithm just out of pure curiosity. I wrote the first version which was simply an iterative algorithm, but this method is very slow. The regression is for a simple linear function in a form y - (ax + c) = 0.

Instead now I went on Wiki page for linear least squares and trying to solve the problem using partial differentials of a least squares function.

I'm using sympy to get partial differentials, which might probably not be the best way, but that's what I managed to dig up so far.

from sympy import symbols, diff

points = [(2, 2), (4, 1.75), (4.15, 3), (4, 4.2), (5, 4), (5, 6),
          (5, 7.3), (7.2, 5.9)]

a, c = symbols('a c', real=True)
S = sum([(item[1] - (a*item[0] + c)) ** 2 for item in points])

# getting partial diffs
S_a = diff(S, a)
S_c = diff(S, c)

After all this I get equations like

S_a
Out[86]: 360.125*a + 72.7*c - 338.46
S_c
Out[87]: 72.7*a + 16*c - 68.3

What I need now is to be able to extract coefficients from these equations so that I can make use of numpy.linalg.solve() to get the solution to this system of equations, like:

A = np.array([[360.125, 72.7], [72.7, 16]])
b = np.array([338.46, 68.3])
x = np.linalg.solve(A, b)

How can I easily get coefficients from sympy partial differentiation output to use in this final step? Thanks!

EDIT: Using the answer to this question I've been able to use regex and get all floats from a string. I convert the output of sympy calculation to a string and strip all spaces (so that signed numbers are properly matched):

import re

S_a = str(diff(S, a))   
S_c = str(diff(S, c))
    
# Strip spaces from strings to get signed floats
S_a = S_a.replace(" ", "")
S_c = S_c.replace(" ", "")
    
coeffs_a = re.findall("[-+]?\d*\.\d+|\d+", S_a)
coeffs_c = re.findall("[-+]?\d*\.\d+|\d+", S_c)
    
A = np.array([[float(coeffs_a[0]), float(coeffs_a[1])], [float(coeffs_c[0]),
             float(coeffs_c[1])]])
b = np.array([float(coeffs_a[2]), float(coeffs_c[2])])
sol = np.linalg.solve(A, b)

This works, but looks ugly as all...

NotAName
  • 3,821
  • 2
  • 29
  • 44

1 Answers1

1

To get the coefficients, use the poly function and then the coeffs function in sympy:

from sympy import symbols, diff, poly

points = [(1, 1), (2, 2)]

a, c = symbols('a c', real=True)
S = sum([(item[1] - (a * item[0] + c)) ** 2 for item in points])

S_a = diff(S, a)
S_c = diff(S, c)
print(S_a)
print(S_c)

p_a = poly(S_a, [a])
p_c = poly(S_c, [c])

print(p_a.coeffs())
print(p_c.coeffs())

Returning:

10*a + 6*c - 10
6*a + 4*c - 6
[10, 6*c - 10]
[4, 6*a - 6]

Domain expansion:

p_a = poly(S_a, [a, c])
p_c = poly(S_c, [a, c])
print(p_a.coeffs())
print(p_c.coeffs())

Gives:

[10, 6, -10]
[6, 4, -6]
Gustav Rasmussen
  • 3,720
  • 4
  • 23
  • 53
  • Thanks Gustav! Maybe I wasn't very clear in my post, but what I need from your example equation `10*a + 6*c - 10` to get the following list (or similar) `[10, 6, -10]`. – NotAName Jul 08 '20 at 06:53
  • Perfect! Thanks! So much better than what I've been able to come up with (see updated post). – NotAName Jul 08 '20 at 07:08
  • You can get polynomial coefficients with regex too, but the `sympy` functions are much more direct :) – Gustav Rasmussen Jul 08 '20 at 07:09