Python gradient descent - cost keeps increasing

Question

I'm trying to implement gradient descent in python and my loss/cost keeps increasing with every iteration.

I've seen a few people post about this, and saw an answer here: gradient descent using python and numpy

I believe my implementation is similar, but cant see what I'm doing wrong to get an exploding cost value:

Iteration: 1 | Cost: 697361.660000
Iteration: 2 | Cost: 42325117406694536.000000
Iteration: 3 | Cost: 2582619233752172973298548736.000000
Iteration: 4 | Cost: 157587870187822131053636619678439702528.000000
Iteration: 5 | Cost: 9615794890267613993157742129590663647488278265856.000000

I'm testing this on a dataset I found online (LA Heart Data): http://www.umass.edu/statdata/statdata/stat-corr.html

Import code:

dataset = np.genfromtxt('heart.csv', delimiter=",")

x = dataset[:]
x = np.insert(x,0,1,axis=1)  # Add 1's for bias
y = dataset[:,6]
y = np.reshape(y, (y.shape[0],1))

Gradient descent:

def gradientDescent(weights, X, Y, iterations = 1000, alpha = 0.01):
    theta = weights
    m = Y.shape[0]
    cost_history = []

    for i in xrange(iterations):
        residuals, cost = calculateCost(theta, X, Y)
        gradient = (float(1)/m) * np.dot(residuals.T, X).T
        theta = theta - (alpha * gradient)

        # Store the cost for this iteration
        cost_history.append(cost)
        print "Iteration: %d | Cost: %f" % (i+1, cost)

Calculate cost:

def calculateCost(weights, X, Y):
    m = Y.shape[0]
    residuals = h(weights, X) - Y
    squared_error = np.dot(residuals.T, residuals)

    return residuals, float(1)/(2*m) * squared_error

Calculate hypothesis:

def h(weights, X):   
    return np.dot(X, weights)

To actually run it:

gradientDescent(np.ones((x.shape[1],1)), x, y, 5)

My best bet is trivial signing issues because it seems like it goes in the wrong direction. — Armen Avetisyan, Sep 29 '16 at 13:10

scomes · Accepted Answer · 2016-09-29T20:23:24.060

5

Assuming that your derivation of the gradient is correct, you are using: =- and you should be using: -=. Instead of updating theta, you are reassigning it to - (alpha * gradient)

EDIT (after the above issue was fixed in the code):

I ran what the code on what I believe is the right dataset and was able to get the cost to behave by setting alpha=1e-7. If you run it for 1e6 iterations you should see it converging. This approach on this dataset appears very sensitive to learning rate.

edited Sep 29 '16 at 20:23

answered Sep 29 '16 at 13:14

scomes

1,756
14
19

I think that was a small typo by me when moving things over to stack overflow. I've made the code more explicit by replacing that line with `theta = theta - (alpha * gradient)` and I still get the same problem – Simon Sep 29 '16 at 19:59

score 2 · Answer 2 · answered Sep 29 '16 at 20:28

In general, if your cost is increasing, then the very first thing you should check is to see if your learning rate is too large. In such cases, the rate is causing the cost function to jump over the optimal value and increase upwards to infinity. Try different small values of your learning rate. When I face the problem that you describe, I usually repeatedly try 1/10 of the learning rate until I can find a rate where J(w) decreases.

Another problem might be a bug in your derivative implementation. A good way to debug is to do a gradient check to compare the analytic gradient versus the numeric gradient.

Python gradient descent - cost keeps increasing

2 Answers2