I am trying to implement a gradient descent algorithm from scratch in python, which should be fairly easy. however, I have been scratching my head for quite while with my code now, unable to make it work.
I generate data as follow:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
#Defining the x array.
x=np.array(range(1,100))
#Defining the y array.
y=10+2*x.ravel()
y=y+np.random.normal(loc=0, scale=70, size=99)
Then define the parameters:
alpha = 0.01 # Which will be the learning rate
NbrIter = 100 # Representing the number of iteration
m = len(y)
theta = np.random.randn(2,1)
and my GD is as follow:
for iter in range(NbrIter):
theta = theta - (1/m) * alpha * ( X.T @ ((X @ theta) - y) )
What I get is a huge matrix, meaning that I have some problem with the linear algebra. However, I really fail to see where the issue is.
(Playing around with the matrices to try to get them to match I reached a theta having the correct form (2x1) with: theta = theta - (1/m) * alpha * ( X.T @ ((X @ theta).T - y).T ) But it does look wrong and the actual value are way off (array([[-8.92647663e+148], [-5.92079000e+150]])) )