-1

I'm trying to implement in Python the first exercise of Andrew NG's Coursera Machine Learning course. In the course the exercise is with Matlab/Octave, but I wanted to implement it in Python as well.

The problem is that the line that updates theta values, does not seem to be working right, is returning values ​​[[0.72088159] [0.72088159]] but should return [[-3.630291] [1.166362]]

I'm using a learning rate of 0.01 and the gradient loop was set to 1500 (the same values ​​from the original exercise in Octave).

And obviously, with these wrong values ​​for theta, the predictions are not correct as shown in the last chart.

In the rows in which I tesyo the cost function with theta values ​​defined as [0; 0] and [-1; 2], the results are correct (the same as the exercise in Octave), so the error can only be in the function of the gradient, but I do not know what went wrong.

I wanted someone to help me figure out what I'm doing wrong. I'm grateful already.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

def load_data():
    X = np.genfromtxt('data.txt', usecols=(0), delimiter=',', dtype=None)    
    y = np.genfromtxt('data.txt', usecols=(1), delimiter=',', dtype=None)    
    X = X.reshape(1, X.shape[0])
    y = y.reshape(1, y.shape[0])
    ones = np.ones(X.shape)
    X = np.append(ones, X, axis=0)
    theta = np.zeros((2, 1))

    return (X, y, theta)


alpha = 0.01         
iter_num = 1500      
debug_at_loop = 10

def plot(x, y, y_hat=None):

    x = x.reshape(x.shape[0], 1)

    plt.xlabel('x')
    plt.ylabel('hΘ(x)')
    plt.ylim(ymax = 25, ymin = -5)
    plt.xlim(xmax = 25, xmin = 5)
    plt.scatter(x, y)

    if type(y_hat) is np.ndarray:
        plt.plot(x, y_hat, '-')

    plt.show()

plot(X[1], y)

def hip(X, theta):
    return np.dot(theta.T, X)

def cost(X, y, theta):
    m = y.shape[1]

    return np.sum(np.square(hip(X, theta) - y)) / (2 * m)

print('With theta = [0 ; 0]')
print('Cost computed =', cost(X, y, np.array([0, 0])))
print()
print('With theta = [-1 ; 2]')
print('Cost computed =', cost(X, y, np.array([-1, 2])))

def grad(X, y, alpha, theta, iter_num=1500, debug_cost_at_each=10):

    J = []
    m = y.shape[1]

    for i in range(iter_num):

        theta -= ((alpha * 1) / m) * np.sum(np.dot(hip(X, theta) - y, X.T))

        if i % debug_cost_at_each == 0:

            J.append(round(cost(X, y, theta), 6))

    return J, theta

X, y, theta = load_data()

J, fit_theta = grad(X, y, alpha, theta)

print('Theta found by Gradient Descent:', fit_theta)

# Predict values for population sizes of 35,000 and 70,000
predict1 = np.dot(np.array([[1], [3.5]]).T, fit_theta);
print('For population = 35,000, we predict a profit of \n', predict1 * 10000);

predict2 = np.dot(np.array([[1], [7]]).T, fit_theta);
print('For population = 70,000, we predict a profit of \n', predict2 * 10000);

pred_y = hip(X, fit_theta)
plot(X[1], y, pred_y.T)

The data I'm using is the following txt:

6.1101,17.592
5.5277,9.1302
8.5186,13.662
7.0032,11.854
5.8598,6.8233
8.3829,11.886
7.4764,4.3483
8.5781,12
6.4862,6.5987
5.0546,3.8166
5.7107,3.2522
14.164,15.505
5.734,3.1551
8.4084,7.2258
5.6407,0.71618
5.3794,3.5129
6.3654,5.3048
5.1301,0.56077
6.4296,3.6518
7.0708,5.3893
6.1891,3.1386
20.27,21.767
5.4901,4.263
6.3261,5.1875
5.5649,3.0825
18.945,22.638
12.828,13.501
10.957,7.0467
13.176,14.692
22.203,24.147
5.2524,-1.22
6.5894,5.9966
9.2482,12.134
5.8918,1.8495
8.2111,6.5426
7.9334,4.5623
8.0959,4.1164
5.6063,3.3928
12.836,10.117
6.3534,5.4974
5.4069,0.55657
6.8825,3.9115
11.708,5.3854
5.7737,2.4406
7.8247,6.7318
7.0931,1.0463
5.0702,5.1337
5.8014,1.844
11.7,8.0043
5.5416,1.0179
7.5402,6.7504
5.3077,1.8396
7.4239,4.2885
7.6031,4.9981
6.3328,1.4233
6.3589,-1.4211
6.2742,2.4756
5.6397,4.6042
9.3102,3.9624
9.4536,5.4141
8.8254,5.1694
5.1793,-0.74279
21.279,17.929
14.908,12.054
18.959,17.054
7.2182,4.8852
8.2951,5.7442
10.236,7.7754
5.4994,1.0173
20.341,20.992
10.136,6.6799
7.3345,4.0259
6.0062,1.2784
7.2259,3.3411
5.0269,-2.6807
6.5479,0.29678
7.5386,3.8845
5.0365,5.7014
10.274,6.7526
5.1077,2.0576
5.7292,0.47953
5.1884,0.20421
6.3557,0.67861
9.7687,7.5435
6.5159,5.3436
8.5172,4.2415
9.1802,6.7981
6.002,0.92695
5.5204,0.152
5.0594,2.8214
5.7077,1.8451
7.6366,4.2959
5.8707,7.2029
5.3054,1.9869
8.2934,0.14454
13.394,9.0551
5.4369,0.61705
Patterson
  • 326
  • 1
  • 7
  • 19
  • Do me a favor, if you give negative vote let me know what's wrong with the question because I do not see any reason for this. – Patterson Mar 31 '18 at 20:29
  • 2
    I suspect people are down voting you because you posted photos of code, not the code itself. People want to try to recreate your problem to help you, but nobody wants to retype your code. Take look here for advice on asking better questions: https://stackoverflow.com/help/mcve – Mark Mar 31 '18 at 21:02
  • I cannot get Python to execute the posted screenshot images of the code. – James Phillips Mar 31 '18 at 21:11
  • Please read [how to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and edit your post correspondingly. – MaxU - stand with Ukraine Mar 31 '18 at 21:14
  • I thought about it before posting, but I thought it would be a lot of code, I found the images better, I was not even thinking that someone would want to run the code. – Patterson Mar 31 '18 at 21:39
  • I edited, but it was better with the prints. – Patterson Mar 31 '18 at 21:50

1 Answers1

0

Well, I got it after losing several strands of hair (the programming will still leave me bald).

It was on the gradient line, and the solution was this:

theta -= ((alpha * 1) / m) * np.dot(X, (hip(X, theta) - y).T)

I changed the place of X and transposed the error vector.

Patterson
  • 326
  • 1
  • 7
  • 19