-1

I'm working on an assignment problem on SGD manual implementation using python. I'm stuck at the dw derivative function.

import numpy as np 
import pandas as pd 
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=50000, n_features=15, n_informative=10, n_redundant
=5,n_classes=2, weights=[0.7], class_sep=0.7, random_state=15)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=15)

def initialize_weights(dim):
    w=np.zeros_like(dim)
    b=0
    return w,b
dim=X_train[0] 
w,b = initialize_weights(dim)
print('w =',(w))
print('b =',str(b))

import math
def sigmoid(z):
''' In this function, we will return sigmoid of z'''
# compute sigmoid(z) and return
    test_neg_int = -z
    sig_z=1/(1+(math.exp(test_neg_int )))

    return sig_z

import math
def logloss(y_true,y_pred):
'''In this function, we will compute log loss '''
    n=len(y_true)
    loss= -(1.0/n)*sum([y_true[i]*math.log(y_pred[i],10)+ (1.0-y_true[i])*math.log(1.0-y_pred[i],10) 
    for i in range(len(y_true))])
    return loss

def gradient_dw(x,y,w,b,alpha,N):
'''In this function, we will compute the gardient w.r.to w '''
    for n in range(0,len(x)):
        dw=[] 
 # y=0, x= 15 array values, w= 15 array values of 0, b=0, alpha=0.0001, n=len(X_train)=37500
        lambda_val = 0.01
        d = x[n]*((y-alpha*((w.T)*x[n]+b)) - ((lambda_val*w)/N))
        dw.append(d)
    print (dw)

def grader_dw(x,y,w,b,alpha,N):
    grad_dw=gradient_dw(x,y,w,b,alpha,N)
    assert(np.sum(grad_dw)==2.613689585)
    return True
grad_x=np.array([-2.07864835,  3.31604252, -0.79104357, -3.87045546, -1.14783286,
   -2.81434437, -0.86771071, -0.04073287,  0.84827878,  1.99451725,
    3.67152472,  0.01451875,  2.01062888,  0.07373904, -5.54586092])
grad_y=0
grad_w,grad_b=initialize_weights(grad_x)
alpha=0.0001
N=len(X_train)
grader_dw(grad_x,grad_y,grad_w,grad_b,alpha,N)

Result i'm getting

[array([-0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0.,
     -0., -0.])]
  ---------------------------------------------------------------------------
 AssertionError                            Traceback (most recent call last)
<ipython-input-168-a3ed60706dc2> in <module>
     10 alpha=0.0001
     11 N=len(X_train)
---> 12 grader_dw(grad_x,grad_y,grad_w,grad_b,alpha,N)

<ipython-input-168-a3ed60706dc2> in grader_dw(x, y, w, b, alpha, N)
      1 def grader_dw(x,y,w,b,alpha,N):
      2     grad_dw=gradient_dw(x,y,w,b,alpha,N)
----> 3     assert(np.sum(grad_dw)==2.613689585)
      4     return True
      5 grad_x=np.array([-2.07864835,  3.31604252, -0.79104357, -3.87045546, -1.14783286,

AssertionError: 

Expected result:

True

Could you please tell me if my understanding of the gradient_dw function is wrong? I'm trying to apply this formula:

dw(t) = xn * (yn − σ * (((w(t))Transpose) * xn + b(t))) − (λ * w(t)) / N)

I'm trying to Compute gradient w.r.t 'w' in the gradient_dw function so as to use it later in the main code. What I'm not understanding is that w is an array of 0s and y=0, so when we apply the dw(t) formula and return dw, we will most likely get an array of 0s, but why does it say " assert(np.sum(grad_dw)==2.613689585)" . how could we possibly get 2.613689585?

4 Answers4

0

Try this:

try:
   assert()
except AssertionError:
   return True
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
0

You are approaching wrong here

  1. While iterating we iterate through 'n' points(as batch size is 1) in stochastic gradient descent, rather than 'd' dimensions. Here you are iterating through 'd' dimensions.

  2. grad_x=np.array([-2.07864835, 3.31604252, -0.79104357, -3.87045546, -1.14783286, -2.81434437, -0.86771071, -0.04073287, 0.84827878, 1.99451725, 3.67152472, 0.01451875, 2.01062888, 0.07373904, -5.54586092])

It is a single point with 15 dimensions. So modify your query like below. It would work.

    def gradient_dw(x,y,w,b,alpha,N):
       '''In this function, we will compute the gardient w.r.to w '''
       dw=x * (y-sigmoid(np.dot(w.T,x)+b)) -(alpha * w)/N

       return dw
ashis
  • 1
  • 1
-1
def gradient_dw(x,y,w,b,alpha,N):

   dw=(x*(y-sigmoid((w.T)*x+b)-(alpha/N)*w))
   return dw 
  • 4
    While this code may answer the question, it would be better to include some context, explaining how it works and when to use it. Code-only answers are not useful in the long run. – 7uc1f3r Jan 08 '21 at 13:05
  • @7uc1f3r Sorry for the inconvenience. Question is how to make SGD classifier without using any library. This code calculates the gradient. – Vihaan Shah May 21 '21 at 07:10
  • I don't see that anywhere in the question – Mad Physicist May 23 '21 at 14:17
-2

This is the solution:

def gradient_dw(x,y,w,b,alpha,N):

    dw =x*(y-sigmoid(np.dot(w,x+b))) - ((alpha*w)/N)
    return dw
  • can you help me in this https://stackoverflow.com/questions/63009169/using-sgd-without-using-sklearn-logloss-increasing-with-every-epoch – Zesty Dragon Jul 21 '20 at 18:47