i written the linear regression ( in one variable) along with gradient descent, it is working fine for smaller dataset, but for larger data set, it is giving error as:
OverflowError: (34, 'Numerical result out of range')
the code directing error in following part :
def gradient_des ( theta0, theta1, x, y):
result = 0;
sumed = 0;
if len(x) == len(y):
for i in range(len(x)):
sumed = sumed + ( line(theta0,theta1,x[i]) - y[i])**2 #error shown in this line.
result = sumed / (2 * len(x))
return result
else:
printf("x and y are of inequal length")
# in general cases for x and y, which were generated for testing purposes below
x = []
for i in range(10):
x = x + [i]
print(x)
#x = [1,2,3,4,5,6]
y = [ 0 for _ in range(len(x))]
for i in range(len(y)):
y[i] = random.randint(-100,100)
print(y)
# y = [13,10,8.75,4,5.5,2]
why is this overflow occuring,
after that in code, changing the learning factor ( i.e. alpha,) sometimes code run for alpha =0.1 but not for alpha = 1 [ for smaller known dataset ]
def linear_reg (x,y):
if len(x) == len(y):
theta0 = random.randint(-10,10)
theta1 = random.randint(-10,10)
alpha = 0.1 # problem in how to decide the the factor to be smal or large
while gradient_des(theta0,theta1,x,y) != 0 : # probably error in this converging condition
temp0 = theta0 - alpha * summed_lin(theta0,theta1,x,y)
temp1 = theta1 - alpha * summed_lin_weighted(theta0,theta1,x,y)
# print(temp0)
# print(temp1)
if theta0 != temp0 and theta1 != temp1:
theta0 = temp0
theta1 = temp1
else:
break;
return [theta0,theta1]
else:
printf("x and y are of inequal length")
for value of alpha = 1, it gives same error as above shouldn't the regression be independent of alpha,( for smaller values )
the full code is here : https://github.com/Transwert/General_purposes/blob/master/linreg.py