Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.

Question

I'm in the second week of Professor Andrew Ng's Machine Learning course through Coursera. We're working on linear regression and right now I'm dealing with coding the cost function.

The code I've written solves the problem correctly but does not pass the submission process and fails the unit test because I have hard coded the values of theta and not allowed for more than two values for theta.

Here's the code I've got so far

function J = computeCost(X, y, theta)

m = length(y);
J = 0;

for i = 1:m,
    h = theta(1) + theta(2) * X(i)
    a = h - y(i);
    b = a^2;
    J = J + b;
    end;
J = J * (1 / (2 * m));

end

the unit test is

computeCost( [1 2 3; 1 3 4; 1 4 5; 1 5 6], [7;6;5;4], [0.1;0.2;0.3])

and should produce ans = 7.0175

So I need to add another for loop to iterate over theta, therefore allowing for any number of values for theta, but I'll be damned if I can wrap my head around how/where.

Can anyone suggest a way I can allow for any number of values for theta within this function?

If you need more information to understand what I'm trying to ask, I will try my best to provide it.

score 90 · Accepted Answer · answered Mar 25 '14 at 07:47

90

You can use vectorize of operations in Octave/Matlab. Iterate over entire vector - it is really bad idea, if your programm language let you vectorize operations. R, Octave, Matlab, Python (numpy) allow this operation. For example, you can get scalar production, if theta = (t0, t1, t2, t3) and X = (x0, x1, x2, x3) in the next way: theta * X' = (t0, t1, t2, t3) * (x0, x1, x2, x3)' = t0*x0 + t1*x1 + t2*x2 + t3*x3 Result will be scalar.

For example, you can vectorize h in your code in the next way:

H = (theta'*X')';
S = sum((H - y) .^ 2);
J = S / (2*m);

answered Mar 25 '14 at 07:47

Simplex

1,723
2
17
26

Have you done away with the for loop there? And if I read that right you've written (theta transpose * X transpose)transpose. – OhNoNotScott Mar 25 '14 at 08:17
Yes, these three lines of code replace entire loop! And so, it's transpose (I use Octave syntax) – Simplex Mar 25 '14 at 08:28
1

I think you have used Capitals for the variables here as a matter of convention for naming matrix variables, so thank you for reminding me about that. What I don't understand is in the line "S = sum((H - y).^2);" what's the "."? I know I've seen it before but I can't recall it's purpose. – OhNoNotScott Mar 26 '14 at 07:34
3

dot in matrix ariphmetic use for element by element operations. For example: A = [ 1 2 ; 3 4 ] B = [ 3 4 ; 1 2 ] So, A*B = [ 5 8 ; 13 20 ] (i.e. usually matrix multiplication) A.*B = [ 3 8 ; 3 8 ] (i.e. element by element multiplication - [ 1*3 2*4 ; 3*1 4*2] Similarly: A.^2 = [1^2 2^2 ; 3^2 4^2 ] = [1 4 ; 9 16 ] – Simplex Mar 26 '14 at 08:14
OK, it took me quite a while to understand why that code works but it does. Thanks. – OhNoNotScott Mar 28 '14 at 10:00
Why didn't you use "ones(1,97)' * ((X*theta)-y).^2"? – GniruT Jan 10 '17 at 08:28
the way you created H is a masterpiece absolutely – Arnav Das May 15 '19 at 14:44
Hi guys, i know it's being a while. But why do you transposed 3 times in H?, the H formula is like `H = theta' * X` – Julian Mendez Feb 12 '21 at 19:21

score 41 · Answer 2 · edited Jun 30 '15 at 00:03

41

Above answer is perfect but you can also do

H = (X*theta);
S = sum((H - y) .^ 2);
J = S / (2*m);

Rather than computing

(theta' * X')'

and then taking the transpose you can directly calculate

(X * theta)

It works perfectly.

edited Jun 30 '15 at 00:03

StefanS

1,089
1
11
38

answered Mar 30 '15 at 15:48

caped114

735
11
11

1

Why do you need parens around `X*theta`? – sebnukem Apr 09 '15 at 04:15
2

You don't need. I have this habit of putting parenthesis just to avoid confusion in case of large expressions. – caped114 Apr 10 '15 at 05:03
8

Just to be clear, the above equality X*theta = (theta'*X')' holds because of the two identities : (A')' = A and A' * B' = (BA)'. So just taking (theta' * X') = (X * theta)' this, transposed, gives ((X * theta)')' which is equal to X * theta. – StefanS Jun 29 '15 at 23:13
11

What I'm confused about is that in the equation for H(x), we have that H(x) = theta' * X, but it seems that we have to take the transpose of that when implementing it in code, but why – rasen58 May 15 '16 at 04:11
1

I'm also very curious about the answer to rasen58's question, even though it was asked a long time ago. – David McHealy Nov 15 '16 at 20:10
15

@rasen58 If anyone still cares about this, I had the same issue when trying to implement this.. Basically what I discovered, is in the cost function equation we have theta' * x. When we implement the function, we don't have x, we have the feature matrix X. x is a vector, X is a matrix where each row is one vector x transposed. So, that's where the extra transpose operations come from. – iCodeSometime Jul 11 '17 at 00:49
1

@kennycoc Thank you for the clarification. ( I reached this page after googling "theta transpose x") :-) – v3gard Oct 07 '18 at 20:53

score 15 · Answer 3 · edited Jun 20 '20 at 09:12

15

The below line return the required 32.07 cost value while we run computeCost once using θ initialized to zeros:

J = (1/(2*m)) * (sum(((X * theta) - y).^2));

and is similar to the original formulas that is given below.

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 04 '15 at 11:41

user3352632

617
6
18

score 3 · Answer 4 · edited Aug 05 '15 at 21:52

3

It can be also done in a line- m- # training sets

J=(1/(2*m)) * ((((X * theta) - y).^2)'* ones(m,1));

edited Aug 05 '15 at 21:52

slfan

8,950
115
65
78

answered Aug 05 '15 at 21:35

prajnan2k

31
1

1

is it required to multiply with ones(m,1) ? – Sumit Kumar Saha Feb 01 '16 at 11:34

Jessica · Answer 5 · 2017-02-28T07:49:28.867

0

J = sum(((X*theta)-y).^2)/(2*m);
ans =  32.073

Above answer is perfect,I thought the problem deeply for a day and still unfamiliar with Octave,so,Just study together!

edited Feb 28 '17 at 07:49

answered Feb 28 '17 at 07:45

Jessica

31
2

1

Sure,with pleasure.It is based on the cost function and uses matrix multiplication,rather than explicit summation or looping. – Jessica Feb 28 '17 at 09:19
1

I am not sure who gave you "-" but this is also solution I came up with. It's cleaner, I believe more efficient. got 100%. – Katarzyna Apr 07 '17 at 16:07

score 0 · Answer 6 · answered Feb 04 '19 at 17:36

0

If you want to use only matrix, so:

temp = (X * theta - y);        % h(x) - y
J = ((temp')*temp)/(2 * m);
clear temp;

answered Feb 04 '19 at 17:36

Konstantin Zyryanov

31
1

score 0 · Answer 7 · answered Mar 29 '20 at 21:03

0

This would work just fine for you -

J =  sum((X*theta - y).^2)*(1/(2*m))

This directly follows from the Cost Function Equation

answered Mar 29 '20 at 21:03

Rohit

1
2

score 0 · Answer 8 · answered May 15 '20 at 06:54

0

Python code for the same :

def computeCost(X, y, theta):
    m = y.size  # number of training examples
    J = 0
    H = (X.dot(theta))
    S = sum((H - y)**2);
    J = S / (2*m);
    return J

answered May 15 '20 at 06:54

Pradeep Bilaiya

11
3

what H stands for? – Feb 03 '22 at 16:50

score -1 · Answer 9 · answered Jul 08 '19 at 16:13

-1

function J = computeCost(X, y, theta)

m = length(y);

J = 0;

% Hypothesis h(x)
h = X * theta;

% Error function (h(x) - y) ^ 2
squaredError = (h-y).^2;

% Cost function
J = sum(squaredError)/(2*m);

end

answered Jul 08 '19 at 16:13

Shakir

93
2
13

Please don't post code only as an answer. This is not helpful. Please take your time to provide high quality answers. Note: "This answer was flagged as low-quality because of its length and content.". If you don't improve the quality of your answer, this post might get deleted. – BionicCode Jul 08 '19 at 21:33
@Zoe What is wrong? I just informed the author that his post was flagged as low-quality and probably will be deleted. Posting code without any explanation is not a good answer. I didn't flag it though. This was just meant to be a nice advice. – BionicCode Jul 08 '19 at 21:52

score -3 · Answer 10 · edited Dec 08 '15 at 05:15

-3

I think we needed to use iteration for much general solution for cost rather one iteration, also the result shows in the PDF 32.07 may not be correct answer that grader is looking for reason being its a one case out of many training data.

I think it should loop through like this

  for i in 1:iteration
  theta = theta - alpha*(1/m)(theta'*x-y)*x

  j = (1/(2*m))(theta'*x-y)^2

edited Dec 08 '15 at 05:15

A J

3,970
14
38
53

answered Dec 08 '15 at 04:50

Gautam Karmakar

1
2

1

Vectorizing your code is better way of solving matrix operations than iterating matrix over a for loop. – Ani Jan 12 '17 at 19:58

Cost Function, Linear Regression, trying to avoid hard coding theta. Octave.

10 Answers10