Vectorized Regularized Gradient Descent not passing numerical check

Question

I've written an implementation in Python using NumPy of vectorized regularized Gradient descent for logistic regression. I've used a numerical check method to check that my implementation is correct. The numerical check verifies my implementation of Linear regression GD, but Logisitc fails, and I cannot find out. Any help would be appreciated. So here goes:

Those are my methods for calculating cost and gradient (update function calculates gradient and updates the parameters):

@staticmethod
def _hypothesis(parameters, features):
    return Activation.sigmoid(features.dot(parameters))

@staticmethod
def _cost_function(parameters, features, targets):
    m = features.shape[0]
    return np.sum(-targets * (np.log(LogisticRegression._hypothesis(parameters, features)) - (1 - targets) * (
        np.log(1 - LogisticRegression._hypothesis(parameters, features))))) / m

@staticmethod
def _update_function(parameters, features, targets, extra_param):
    regularization_vector = extra_param.get("regularization_vector", 0)
    alpha = extra_param.get("alpha", 0.001)
    m = features.shape[0]

    return parameters - alpha / m * (
        features.T.dot(LogisticRegression._hypothesis(parameters, features) - targets)) + \
           (regularization_vector / m) * parameters

The cost function doesn't have regularization included, but the test I do is with a regularization vector equal to zero so it does not matter. How I am testing:

def numerical_check(features, parameters, targets, cost_function, update_function, extra_param, delta):
gradients = - update_function(parameters, features, targets, extra_param)

parameters_minus = np.copy(parameters)
parameters_plus = np.copy(parameters)
parameters_minus[0, 0] = parameters_minus[0, 0] + delta
parameters_plus[0, 0] = parameters_plus[0, 0] - delta

approximate_gradient = - (cost_function(parameters_plus, features, targets) -
                          cost_function(parameters_minus, features, targets)) / (2 * delta) / parameters.shape[0]

return abs(gradients[0, 0] - approximate_gradient) <= delta

Basically, I am manually calculating the gradient when I shift the first parameter delta amount to the left and to the right. And then I compare it with the gradients I get from the update function. I am using initial parameters equal to 0 so the updated parameter received is equal to the gradient divided by and the number of features. Also alpha is equal to one. Unfortunately, I am getting different values from the two methods and I cannot find out why. Any advice on how to troubleshoot this problem would be really appreciated.

Should the last line not be `return abs(gradients[0, 0]...`? — Paul Brodersen, Feb 08 '21 at 15:54
Also, you really should not re-use `delta` in the check in the last line. A negative `delta` is perfectly valid but the last `delta` in the last line has to be positive (and small). — Paul Brodersen, Feb 08 '21 at 15:57
Thanks, Paul, yes it's actually `gradients[0, 0]`, this was was an error during copying. Yes, I will rethink using delta here. — MitakaJ9, Feb 08 '21 at 17:17
It looks like there is a mutable default somewhere we do not see. Incidentally, if `_update_function` belongs to `LogisticRegression`, it could be a [classmethod](https://stackoverflow.com/questions/136097/difference-between-staticmethod-and-classmethod) since it resorts to `~._hypothesis`. — keepAlive, Feb 08 '21 at 22:41

score 3 · Accepted Answer · answered Feb 11 '21 at 21:21

there is an error in your cost function. error is due to invalid distribution of brackets. i've fixed that

def _cost_function(parameters, features, targets):
    m = features.shape[0]
    
    return -np.sum(
        (    targets) * (np.log(    LogisticRegression._hypothesis(parameters, features)))
      + (1 - targets) * (np.log(1 - LogisticRegression._hypothesis(parameters, features)))
    ) / m

try writing your code cleanly, it helps to detect errors like these

Thanks a lot! I will give it a shot tonight.Hope it works and I can approve the answer :) — MitakaJ9, Feb 12 '21 at 12:25

score 2 · Answer 2 · answered Feb 11 '21 at 05:41

I think I spotted a possible error in your code, tell me if this is true.

In your numerical_check function you are calling the update_function to initialize the gradient. However, in your _update_function above, you aren't not actually returning the gradients but your are returning the updated value of the parameters.

That is, notice the return statement of your _update_function is this :

return parameters - alpha / m * (
    features.T.dot(LogisticRegression._hypothesis(parameters, features) - targets)) + \
       (regularization_vector / m) * parameters

What I would like to advise you and what I do in my ML algorithms is make a separate function for calculating gradients for e.g.

def _gradient(features, parameters, target):
    m = features.shape[0]
    return features.T.dot(LogisticRegression._hypothesis(parameters, features) - targets)) / m

And then change your numerical_check function to initialize the gradient as follows :

gradient = _gradient(features, parameters, target)

I hope this solves your problem.

Thanks, I will edit it. To separate the gradient from the update. But in the current case, it's not that as initial parameters are zeros so the fact that we are updating a parameter does not really matter, we still get the gradient at the end. — MitakaJ9, Feb 12 '21 at 12:25
Parameters being zero still doesn't nullify the effect of regularization and multiplication by `alpha` so finally you have to make a separate function to get the `gradient`. And If this helps please upvote and mark it as correct, lol. — Suraj Upadhyay, Feb 12 '21 at 17:54

Vectorized Regularized Gradient Descent not passing numerical check

2 Answers2