How to calculate cost and theta with fmin_ncg

Question

I am following Andrew NG course on coursera and I wanted to implement that same logic on python. I am trying to compute cost and theta with

scipy.optimize.fmin_ncg

Here's a code

import numpy as np

from scipy.optimize import fmin_ncg


def sigmoid(z):
    return (1 / (1 + np.exp(-z))).reshape(-1, 1)


def compute_cost(theta, X, y):
    m = len(y)
    hypothesis = sigmoid(np.dot(X, theta))
    cost = (1 / m) * np.sum(np.dot(-y.T, (np.log(hypothesis))) - np.dot((1 - y.T), np.log(1 - hypothesis)))
    return cost


def compute_gradient(theta, X, y):
    m = len(y)
    hypothesis = sigmoid(np.dot(X, theta))
    gradient = (1 / m) * np.dot(X.T, (hypothesis - y))
    return gradient


def main():
    data = np.loadtxt("data/data1.txt", delimiter=",")  # 100, 3

    X = data[:, 0:2]
    y = data[:, 2:]
    m, n = X.shape

    initial_theta = np.zeros((n + 1, 1))
    X = np.column_stack((np.ones(m), X))
    mr = fmin_ncg(compute_cost, initial_theta, compute_gradient, args=(X, y), full_output=True)
    print(mr)

if __name__ == "__main__":
    main()

When I try to run this I get and error / exception like below

Traceback (most recent call last):
  File "/file/path/without_regression.py", line 78, in <module>
    main()
  File "/file/path/without_regression.py", line 66, in main
    mr = fmin_ncg(compute_cost, initial_theta, compute_gradient, args=(X, y), full_output=True)
  File "/usr/local/anaconda3/envs/ml/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 1400, in fmin_ncg
    callback=callback, **opts)
  File "/usr/local/anaconda3/envs/ml/lib/python3.6/site-packages/scipy/optimize/optimize.py", line 1497, in _minimize_newtoncg
    dri0 = numpy.dot(ri, ri)
ValueError: shapes (3,1) and (3,1) not aligned: 1 (dim 1) != 3 (dim 0)

I don't understand this error. May be because I am beginner this to not verbose for me.

How to use scipy.optimize.fmin_ncg or any other minimization technique such as scipy.optimize.minimize(...) to compute cost and theta?

Well... obviously dot is not defined for (3,1) * (3,1). For (3,1) * (1,3) it would be (but that does not mean it's what you want). I would start with an initial theta defined like ```np.zeros(n + 1)``` (drop the extra dimension) and debug from there (not 100% sure it changes things, but it's more API-like). (more help is hard to ask for as this code is not reproducible for us! -> external data) — sascha, Nov 17 '17 at 03:45
[Link to data file](https://github.com/gopaczewski/coursera-ml/blob/master/mlclass-ex2-005/mlclass-ex2/ex2data1.txt). And yes it is obvious that it cannot do the multiplication because of dimension mismatch. I will try implement as you suggested with initial theta np.zeros(n + 1). Thanks — A0__oN, Nov 17 '17 at 04:03
```gradient = (1 / m) * np.dot(X.T, (hypothesis - y)).ravel()``` will make it run (3,1 -> (3) shape). (I also used my single-dim initial-theta as mentioned). **Edit:** i wrote earlier the gradient is wrong, but my check against scipy's check_gradient was with using the broken shapes. Looks good! — sascha, Nov 17 '17 at 04:12

sascha · Accepted Answer · 2017-11-17T04:32:52.997

As mentioned in the comments:

Without having a reference to the docs right now, you should always use single-dimension arrays.

Relevant SO-question

import numpy as np
a = np.random.random(size=(3,1))   # NOT TO USE!
a.shape  # (3, 1)
a.ndim   # 2
b = np.random.random(size=3)       # TO USE!
b.shape  # (3,)                    
b.ndim   # 1

This applies to your x0 (if not using python-lists) and your gradient.

A quick hack (=dropping dim in gradient) like:

gradient = (1 / m) * np.dot(X.T, (hypothesis - y)).ravel()  # .ravel()!
...      
initial_theta = np.zeros(n + 1)  # drop extra-dim

makes the code run:

Optimization terminated successfully.
         Current function value: 0.203498
         Iterations: 27
         Function evaluations: 71
         Gradient evaluations: 229
         Hessian evaluations: 0
(array([-25.13045417,   0.20598475,   0.2012217 ]), 0.2034978435366513, 71, 229, 0, 0)

Extra: during debugging i also checked the calculation of the gradient itself against numerical-differentiation (recommended!), which looks good using x0:

from scipy.optimize import check_grad as cg
print(cg(compute_cost, compute_gradient, initial_theta, X, y))
# 1.24034933954e-05

@AmanTuladhar Consider using the more general scipy.optimize.minimize if you have no special reason to use one specific solver (you can play with different optimizers then). — sascha, Nov 17 '17 at 04:24
I will try that as well @sascha this is initial start.. will explore more stuffs.. Thanks again — A0__oN, Nov 17 '17 at 04:25

How to calculate cost and theta with fmin_ncg

1 Answers1