12

I'm trying my hand at regularized LR, simple with this formulas in matlab:

The cost function:

J(theta) = 1/m*sum((-y_i)*log(h(x_i)-(1-y_i)*log(1-h(x_i))))+(lambda/2*m)*sum(theta_j)

The gradient:

∂J(theta)/∂theta_0 = [(1/m)*(sum((h(x_i)-y_i)*x_j)] if j=0

∂j(theta)/∂theta_n = [(1/m)*(sum((h(x_i)-y_i)*x_j)]+(lambda/m)*(theta_j) if j>1

This is not matlab code is just the formula.

So far I've done this:

function [J, grad] = costFunctionReg(theta, X, y, lambda)

J = 0;
grad = zeros(size(theta));

temp_theta = [];

%cost function

%get the regularization term

for jj = 2:length(theta)

    temp_theta(jj) = theta(jj)^2;
end

theta_reg = lambda/(2*m)*sum(temp_theta);

temp_sum =[];

%for the sum in the cost function

for ii =1:m

   temp_sum(ii) = -y(ii)*log(sigmoid(theta'*X(ii,:)'))-(1-y(ii))*log(1-sigmoid(theta'*X(ii,:)'));

end

tempo = sum(temp_sum);

J = (1/m)*tempo+theta_reg;

%regulatization
%theta 0

reg_theta0 = 0;

for jj=1:m
 reg_theta0(jj) = (sigmoid(theta'*X(m,:)') -y(jj))*X(jj,1)
end    

reg_theta0 = (1/m)*sum(reg_theta0)

grad_temp(1) = reg_theta0

%for the rest of thetas

reg_theta  = [];
thetas_sum = 0;

for ii=2:size(theta)
    for kk =1:m
        reg_theta(kk) = (sigmoid(theta'*X(m,:)') - y(kk))*X(kk,ii)
    end
    thetas_sum(ii) = (1/m)*sum(reg_theta)+(lambda/m)*theta(ii)
    reg_theta = []
end

for i=1:size(theta)

    if i == 1
        grad(i) = grad_temp(i)
    else
        grad(i) = thetas_sum(i)
    end
end
end

And the cost function is giving correct results, but I have no idea why the gradient (one step) is not, the cost gives J = 0.6931 which is correct and the gradient grad = 0.3603 -0.1476 0.0320, which is not, the cost starts from 2 because the parameter theta(1) does not have to be regularized, any help? I guess there is something wrong with the code, but after 4 days I can't see it.Thanks

santiaago
  • 608
  • 9
  • 19
Pedro.Alonso
  • 1,007
  • 3
  • 20
  • 41

4 Answers4

46

Vectorized:

function [J, grad] = costFunctionReg(theta, X, y, lambda)

hx = sigmoid(X * theta);
m = length(X);

J = (sum(-y' * log(hx) - (1 - y')*log(1 - hx)) / m) + lambda * sum(theta(2:end).^2) / (2*m);
grad =((hx - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m ;

end
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • can you please explain me what is significance of adding the regularization term `+lambda * sum(theta(2:end).^2) / (2*m)` in `J` as we are passing `initial_theta = zeros(size(X, 2), 1);` the term when multiplied gives value 0 So, what's the point because the cost Function would remain the same... – Incpetor Aug 10 '14 at 15:12
  • @Inceptor361 `theta` are 0 the first time you call `costFunctionReg`, but after the first iteration they are going to be changed. – Franck Dernoncourt Aug 10 '14 at 16:45
  • 4
    just to point out that `J = (sum(...` here first `sum` is redundant as its' argument already has dimension {1x1}, which is just a number. – ratijas Aug 08 '15 at 13:17
  • 1
    am I right in thinking that in [ ((hx - y)' * X / m)' + lambda .* theta .* [0; ones(length(theta)-1, 1)] ./ m ] the operator .* is only needed between theta and [0,ones(... )] as otherwise, simple * is enough? – Charbel Apr 07 '17 at 22:00
  • @FranckDernoncourt FYI, someone has [asked a new question](https://stackoverflow.com/questions/64030007/regularized-logistic-regresion-with-vectorization) about this answer. – TylerH Sep 23 '20 at 14:38
17

I used more variables, so you could see clearly what comes from the regular formula, and what comes from "the regularization cost added". Additionally, It is a good practice to use "vectorization" instead of loops in Matlab/Octave. By doing this, you guarantee a more optimized solution.

 function [J, grad] = costFunctionReg(theta, X, y, lambda)

    %Hypotheses
    hx = sigmoid(X * theta);

    %%The cost without regularization
    J_partial = (-y' * log(hx) - (1 - y)' * log(1 - hx)) ./ m;


    %%Regularization Cost Added
    J_regularization = (lambda/(2*m)) * sum(theta(2:end).^2);

    %%Cost when we add regularization
    J = J_partial + J_regularization;

    %Grad without regularization
    grad_partial = (1/m) * (X' * (hx -y));

    %%Grad Cost Added
    grad_regularization = (lambda/m) .* theta(2:end);

    grad_regularization = [0; grad_regularization];

    grad = grad_partial + grad_regularization;
Fernando Cardenas
  • 1,203
  • 15
  • 19
7

Finally got it, after rewriting it again like for the 4th time, this is the correct code:

function [J, grad] = costFunctionReg(theta, X, y, lambda)
J = 0;
grad = zeros(size(theta));

temp_theta = [];

for jj = 2:length(theta)

    temp_theta(jj) = theta(jj)^2;
end

theta_reg = lambda/(2*m)*sum(temp_theta);

temp_sum =[];

for ii =1:m

   temp_sum(ii) = -y(ii)*log(sigmoid(theta'*X(ii,:)'))-(1-y(ii))*log(1-sigmoid(theta'*X(ii,:)'));

end

tempo = sum(temp_sum);

J = (1/m)*tempo+theta_reg;

%regulatization
%theta 0

reg_theta0 = 0;

for i=1:m
    reg_theta0(i) = ((sigmoid(theta'*X(i,:)'))-y(i))*X(i,1)
end

theta_temp(1) = (1/m)*sum(reg_theta0)

grad(1) = theta_temp

sum_thetas = []
thetas_sum = []

for j = 2:size(theta)
    for i = 1:m

        sum_thetas(i) = ((sigmoid(theta'*X(i,:)'))-y(i))*X(i,j)
    end

    thetas_sum(j) = (1/m)*sum(sum_thetas)+((lambda/m)*theta(j))
    sum_thetas = []
end

for z=2:size(theta)
    grad(z) = thetas_sum(z)
end


% =============================================================

end

If its helps anyone, or anyone has any comments on how can I do it better. :)

Pedro.Alonso
  • 1,007
  • 3
  • 20
  • 41
  • 1
    Thanks, could you exlpain? 1. Why we are skiping theta(1) here for cost J? 2. Why we are ignoring lambda/m*theta for grad(1)? –  Nov 26 '15 at 03:42
  • If i'm remembering correctly by looking at the code theta(1) is not being skipped, is being calculated alone, i saw this as easier. And the second question i'm not sure what i was trying to accomplish there. – Pedro.Alonso Nov 27 '15 at 17:10
  • 1
    I believe grad(1) is skipped from regularisation because it corresponds to the weights for column of `1`s that you're adding to the data – Ciprian Tomoiagă Oct 26 '16 at 09:49
1

Here is an answer that eliminates the loops

m = length(y); % number of training examples

predictions = sigmoid(X*theta);
reg_term = (lambda/(2*m)) * sum(theta(2:end).^2);
calcErrors = -y.*log(predictions) - (1 -y).*log(1-predictions);
J = (1/m)*sum(calcErrors)+reg_term;

% prepend a 0 column to our reg_term matrix so we can use simple matrix addition
reg_term = [0 (lambda*theta(2:end)/m)'];
grad = sum(X.*(predictions - y)) / m + reg_term;
Rick
  • 21
  • 1