58

I'm in the second week of Professor Andrew Ng's Machine Learning course through Coursera. We're working on linear regression and right now I'm dealing with coding the cost function.

The code I've written solves the problem correctly but does not pass the submission process and fails the unit test because I have hard coded the values of theta and not allowed for more than two values for theta.

Here's the code I've got so far

function J = computeCost(X, y, theta)

m = length(y);
J = 0;

for i = 1:m,
    h = theta(1) + theta(2) * X(i)
    a = h - y(i);
    b = a^2;
    J = J + b;
    end;
J = J * (1 / (2 * m));

end

the unit test is

computeCost( [1 2 3; 1 3 4; 1 4 5; 1 5 6], [7;6;5;4], [0.1;0.2;0.3])

and should produce ans = 7.0175

So I need to add another for loop to iterate over theta, therefore allowing for any number of values for theta, but I'll be damned if I can wrap my head around how/where.

Can anyone suggest a way I can allow for any number of values for theta within this function?

If you need more information to understand what I'm trying to ask, I will try my best to provide it.

OhNoNotScott
  • 824
  • 2
  • 9
  • 12

10 Answers10

90

You can use vectorize of operations in Octave/Matlab. Iterate over entire vector - it is really bad idea, if your programm language let you vectorize operations. R, Octave, Matlab, Python (numpy) allow this operation. For example, you can get scalar production, if theta = (t0, t1, t2, t3) and X = (x0, x1, x2, x3) in the next way: theta * X' = (t0, t1, t2, t3) * (x0, x1, x2, x3)' = t0*x0 + t1*x1 + t2*x2 + t3*x3 Result will be scalar.

For example, you can vectorize h in your code in the next way:

H = (theta'*X')';
S = sum((H - y) .^ 2);
J = S / (2*m);
Simplex
  • 1,723
  • 2
  • 17
  • 26
  • Have you done away with the for loop there? And if I read that right you've written (theta transpose * X transpose)transpose. – OhNoNotScott Mar 25 '14 at 08:17
  • Yes, these three lines of code replace entire loop! And so, it's transpose (I use Octave syntax) – Simplex Mar 25 '14 at 08:28
  • 1
    I think you have used Capitals for the variables here as a matter of convention for naming matrix variables, so thank you for reminding me about that. What I don't understand is in the line "S = sum((H - y).^2);" what's the "."? I know I've seen it before but I can't recall it's purpose. – OhNoNotScott Mar 26 '14 at 07:34
  • 3
    dot in matrix ariphmetic use for element by element operations. For example: A = [ 1 2 ; 3 4 ] B = [ 3 4 ; 1 2 ] So, A*B = [ 5 8 ; 13 20 ] (i.e. usually matrix multiplication) A.*B = [ 3 8 ; 3 8 ] (i.e. element by element multiplication - [ 1*3 2*4 ; 3*1 4*2] Similarly: A.^2 = [1^2 2^2 ; 3^2 4^2 ] = [1 4 ; 9 16 ] – Simplex Mar 26 '14 at 08:14
  • OK, it took me quite a while to understand why that code works but it does. Thanks. – OhNoNotScott Mar 28 '14 at 10:00
  • Why didn't you use "ones(1,97)' * ((X*theta)-y).^2"? – GniruT Jan 10 '17 at 08:28
  • the way you created H is a masterpiece absolutely – Arnav Das May 15 '19 at 14:44
  • Hi guys, i know it's being a while. But why do you transposed 3 times in H?, the H formula is like `H = theta' * X` – Julian Mendez Feb 12 '21 at 19:21
41

Above answer is perfect but you can also do

H = (X*theta);
S = sum((H - y) .^ 2);
J = S / (2*m);

Rather than computing

(theta' * X')'

and then taking the transpose you can directly calculate

(X * theta)

It works perfectly.

StefanS
  • 1,089
  • 1
  • 11
  • 38
caped114
  • 735
  • 11
  • 11
  • 1
    Why do you need parens around `X*theta`? – sebnukem Apr 09 '15 at 04:15
  • 2
    You don't need. I have this habit of putting parenthesis just to avoid confusion in case of large expressions. – caped114 Apr 10 '15 at 05:03
  • 8
    Just to be clear, the above equality X*theta = (theta'*X')' holds because of the two identities : (A')' = A and A' * B' = (BA)'. So just taking (theta' * X') = (X * theta)' this, transposed, gives ((X * theta)')' which is equal to X * theta. – StefanS Jun 29 '15 at 23:13
  • 11
    What I'm confused about is that in the equation for H(x), we have that H(x) = theta' * X, but it seems that we have to take the transpose of that when implementing it in code, but why – rasen58 May 15 '16 at 04:11
  • 1
    I'm also very curious about the answer to rasen58's question, even though it was asked a long time ago. – David McHealy Nov 15 '16 at 20:10
  • 15
    @rasen58 If anyone still cares about this, I had the same issue when trying to implement this.. Basically what I discovered, is in the cost function equation we have theta' * x. When we implement the function, we don't have x, we have the feature matrix X. x is a vector, X is a matrix where each row is one vector x transposed. So, that's where the extra transpose operations come from. – iCodeSometime Jul 11 '17 at 00:49
  • 1
    @kennycoc Thank you for the clarification. ( I reached this page after googling "theta transpose x") :-) – v3gard Oct 07 '18 at 20:53
15

The below line return the required 32.07 cost value while we run computeCost once using θ initialized to zeros:

J = (1/(2*m)) * (sum(((X * theta) - y).^2));

and is similar to the original formulas that is given below.

enter image description here

Community
  • 1
  • 1
user3352632
  • 617
  • 6
  • 18
3

It can be also done in a line- m- # training sets

J=(1/(2*m)) * ((((X * theta) - y).^2)'* ones(m,1));
slfan
  • 8,950
  • 115
  • 65
  • 78
prajnan2k
  • 31
  • 1
0
J = sum(((X*theta)-y).^2)/(2*m);
ans =  32.073

Above answer is perfect,I thought the problem deeply for a day and still unfamiliar with Octave,so,Just study together!

Jessica
  • 31
  • 2
  • 1
    Sure,with pleasure.It is based on the cost function and uses matrix multiplication,rather than explicit summation or looping. – Jessica Feb 28 '17 at 09:19
  • 1
    I am not sure who gave you "-" but this is also solution I came up with. It's cleaner, I believe more efficient. got 100%. – Katarzyna Apr 07 '17 at 16:07
0

If you want to use only matrix, so:

temp = (X * theta - y);        % h(x) - y
J = ((temp')*temp)/(2 * m);
clear temp;
0

This would work just fine for you -

J =  sum((X*theta - y).^2)*(1/(2*m))

This directly follows from the Cost Function Equation

Rohit
  • 1
  • 2
0

Python code for the same :

def computeCost(X, y, theta):
    m = y.size  # number of training examples
    J = 0
    H = (X.dot(theta))
    S = sum((H - y)**2);
    J = S / (2*m);
    return J
-1
function J = computeCost(X, y, theta)

m = length(y);

J = 0;

% Hypothesis h(x)
h = X * theta;

% Error function (h(x) - y) ^ 2
squaredError = (h-y).^2;

% Cost function
J = sum(squaredError)/(2*m);

end
Shakir
  • 93
  • 2
  • 13
  • Please don't post code only as an answer. This is not helpful. Please take your time to provide high quality answers. Note: "This answer was flagged as low-quality because of its length and content.". If you don't improve the quality of your answer, this post might get deleted. – BionicCode Jul 08 '19 at 21:33
  • @Zoe What is wrong? I just informed the author that his post was flagged as low-quality and probably will be deleted. Posting code without any explanation is not a good answer. I didn't flag it though. This was just meant to be a nice advice. – BionicCode Jul 08 '19 at 21:52
-3

I think we needed to use iteration for much general solution for cost rather one iteration, also the result shows in the PDF 32.07 may not be correct answer that grader is looking for reason being its a one case out of many training data.

I think it should loop through like this

  for i in 1:iteration
  theta = theta - alpha*(1/m)(theta'*x-y)*x

  j = (1/(2*m))(theta'*x-y)^2
A J
  • 3,970
  • 14
  • 38
  • 53
  • 1
    Vectorizing your code is better way of solving matrix operations than iterating matrix over a for loop. – Ani Jan 12 '17 at 19:58