1

i written the linear regression ( in one variable) along with gradient descent, it is working fine for smaller dataset, but for larger data set, it is giving error as:

OverflowError: (34, 'Numerical result out of range') 

the code directing error in following part :

def gradient_des ( theta0, theta1, x, y):
    result = 0;
    sumed = 0;
    if len(x) == len(y):
        for i in range(len(x)):
            sumed = sumed + ( line(theta0,theta1,x[i]) - y[i])**2 #error shown in this line.
        result = sumed / (2 * len(x))
        return result
    else:
        printf("x and y are of inequal length")

# in general cases for x and y, which were generated for testing purposes below
x = []
for i in range(10):
    x = x + [i]
print(x)
#x = [1,2,3,4,5,6]
y = [ 0 for _  in range(len(x))]
for i in range(len(y)):
    y[i] = random.randint(-100,100)
print(y)
# y = [13,10,8.75,4,5.5,2]

why is this overflow occuring,

after that in code, changing the learning factor ( i.e. alpha,) sometimes code run for alpha =0.1 but not for alpha = 1 [ for smaller known dataset ]

def linear_reg (x,y):
    if len(x) == len(y):
        theta0 = random.randint(-10,10)
        theta1 = random.randint(-10,10)
        alpha = 0.1 # problem in how to decide the the factor to be smal or large

        while gradient_des(theta0,theta1,x,y) != 0 : # probably error in this converging condition
            temp0 = theta0 - alpha * summed_lin(theta0,theta1,x,y)
            temp1 = theta1 - alpha * summed_lin_weighted(theta0,theta1,x,y)
            # print(temp0)
            # print(temp1)
            if theta0 != temp0 and theta1 != temp1:
                theta0 = temp0
                theta1 = temp1
            else:
                break;
        return [theta0,theta1]
    else:
        printf("x and y are of inequal length")

for value of alpha = 1, it gives same error as above shouldn't the regression be independent of alpha,( for smaller values )

the full code is here : https://github.com/Transwert/General_purposes/blob/master/linreg.py

Transwert
  • 83
  • 1
  • 10
  • https://stackoverflow.com/questions/12666600/overflowerror-numerical-result-out-of-range-when-generating-fibonacci-numbers Check this – Siva Shanmugam May 24 '19 at 09:17
  • 1
    You will very rarely **ever** get the error to get to exactly zero. Combined with a probable large learning rate `alpha`, this is why your code doesn't converge and hence your variables are overflowing. You should cap this at a maximum number of iterations or when the error between successive iterations is less than some threshold. Also, setting `alpha` to be smaller helps mitigate the overflow but you don't have a proper exit condition in your gradient descent algorithm for this to work properly. – rayryeng May 24 '19 at 09:20
  • @SivaShanmugam That post doesn't help. The problem is the gradient descent algorithm itself. The overflow in data type is simply a by-product of an incorrectly written algorithm. – rayryeng May 24 '19 at 09:24
  • @rayryeng it means if i can use a termination condition like gradient descent value going lower than e^-16 or something like that, it should terminate return the vaue – Transwert May 24 '19 at 09:34
  • @rayryeng can you suggest changes that should be made, to eradicate that runtime problem – Transwert May 25 '19 at 08:43
  • Already told you. However it would be nice to show us what dataset you're using for the code to overflow. I can perhaps write an answer once I verify that it works with my changes. – rayryeng May 25 '19 at 19:57
  • @rayryeng apologies for late replies, like i used a .csv file and using the code for linear regression, i got corect linear model, but the model seems to be not working while generating random datasets for x and y. the data set and code is : https://github.com/Transwert/General_purposes with code - linearreg.py and data as testfile.csv – Transwert Jun 04 '19 at 11:34

0 Answers0