1

I'm trying to implement gradient descent algorithm to linear regression. I think figured out the math part, but it doesn't work in Python.

from sklearn.datasets import load_boston
import pandas as pd
import numpy as np
import random

data = load_boston()
df = pd.DataFrame(data['data'], columns=data['feature_names'])
y = data['target']
X = df.TAX

def RMSE(y, y_hat):
    return np.sqrt(sum((y - y_hat) ** 2) / len(y))

def partial_k(x, y, y_hat):
    n = len(y)
    gradient = 0
    for x_i, y_i, y_hat_i in zip(list(x), list(y), list(y_hat)):
        gradient += (y_i - y_hat_i) * x_i
    return -2 / n * gradient

def partial_b(y, y_hat):
    n = len(y)
    gradient = 0
    for y_i, y_hat_i in zip(list(y), list(y_hat)):
        gradient += (y_i - y_hat_i)
    return -2 / n * gradient

def gradient(X, y, n, alpha=0.01, loss=RMSE):
    loss_min = float('inf')

    k = random.random() * 200 - 100
    b = random.random() * 200 - 100

    for i in range(n):
        y_hat = k * X + b
        loss_new = loss(y, y_hat)
        if loss_new < loss_min:
            loss_min = loss_new
            print(f"round: {i}, k: {k}, b: {b}, {loss}: {loss_min}")
        k_gradient = partial_k(X, y, y_hat)
        b_gradient = partial_b(y, y_hat)
        k += -k_gradient * alpha
        b += -b_gradient * alpha
    return (k, b)
gradient(X, y, 200)

The script works only for the first iteration, then throw out warning;

/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: overflow encountered in double_scalars
  """
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:12: RuntimeWarning: overflow encountered in double_scalars
  if sys.path[0] == '':
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:29: RuntimeWarning: invalid value encountered in double_scalars
/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:30: RuntimeWarning: invalid value encountered in double_scalars
vincent
  • 307
  • 1
  • 2
  • 11

2 Answers2

0

Looks like one of your operations is overflowing the type. What are the causes of overflow encountered in double_scalars besides division by zero?

If you can run your code using a debugger, you'll be able to find which line causes the overflow, and change your type to something larger.

0

The main feature of gradient descent is to reduce the value of theta till optimum but when the alpha(learning rate) is large gradient descent may overshoot the minimum and increases in an oscillatory fashion over the graph.This is the main reason of why overflow occurs here.

Try reducing alpha(learning rate) OR apply feature normalization.

here