Is my gradient checking method correct and my gradient calculation wrong, or vice versa?

Question

My network only achieve around 80%, but the reported best score is around 85% accuracy. I m using same input data and same initalization. I dont know whats wrong, so I try to check my gradients and implemented what is recommended for gradient checking: http://ufldl.stanford.edu/tutorial/supervised/DebuggingGradientChecking/

But i m not sure, if my implementation is correct:

        public void gradientchecking(double[] theta){
        System.out.println("Gradient Checking started");
        //costfunction returns cost and gradients
        IPair<Double, double[]> org = costfunction(theta);
        double[] theta_pos = new double[theta.length];
        double[] theta_neg = new double[theta.length];
        for (int i = 0; i < theta.length; i++) {
            theta_pos[i]= theta[i];
            theta_neg[i]=theta[i];
        }

        double mu = 1e-5;
        for (int k = 0; k < 20; k++) {
            theta_pos[k] = theta_pos[k] + mu;
            theta_neg[k] = theta_neg[k] - mu;
            IPair<Double, double[]> pos = costfunction(theta_pos);
            IPair<Double, double[]> neg = costfunction(theta_neg);
            System.out.println("Org: "+org.getSecond()[k] +" check:"+ ((pos.getSecond()[k]-neg.getSecond()[k])/(2*mu)));
            //System.out.println("Org: "+org.getSecond()[k] +"check:"+ ((pos.getSecond()[k]-neg.getSecond()[k])/(2*mu)));
            theta_pos[k] = theta_pos[k] - mu;
            theta_neg[k] = theta_neg[k] + mu;
        }
    }
}

I got the following result after a freshly initialized theta:

Gradient Checking started
Cost: 1.1287071297725055 | Wrong: 124 | start: Thu Jul 30 22:57:08 CEST 2015 |end: Thu Jul 30 22:57:18 CEST 2015
Cost: 1.128707130295382 | Wrong: 124 | start: Thu Jul 30 22:57:18 CEST 2015 |end: Thu Jul 30 22:57:28 CEST 2015
Cost: 1.1287071292496391 | Wrong: 124 | start: Thu Jul 30 22:57:28 CEST 2015 |end: Thu Jul 30 22:57:38 CEST 2015
Org: 5.2287135944026004E-5 check:1.0184607936733826E-4
Cost: 1.1287071299252593 | Wrong: 124 | start: Thu Jul 30 22:57:38 CEST 2015 |end: Thu Jul 30 22:57:47 CEST 2015
Cost: 1.1287071296197628 | Wrong: 124 | start: Thu Jul 30 22:57:47 CEST 2015 |end: Thu Jul 30 22:57:56 CEST 2015
Org: 1.5274823511207024E-5 check:1.141254586229615E-4
Cost: 1.1287071299063134 | Wrong: 124 | start: Thu Jul 30 22:57:56 CEST 2015 |end: Thu Jul 30 22:58:05 CEST 2015
Cost: 1.1287071296387077 | Wrong: 124 | start: Thu Jul 30 22:58:05 CEST 2015 |end: Thu Jul 30 22:58:14 CEST 2015
Org: 1.3380293717695182E-5 check:1.0008639478696018E-4
Cost: 1.1287071297943114 | Wrong: 124 | start: Thu Jul 30 22:58:14 CEST 2015 |end: Thu Jul 30 22:58:23 CEST 2015
Cost: 1.1287071297507094 | Wrong: 124 | start: Thu Jul 30 22:58:23 CEST 2015 |end: Thu Jul 30 22:58:32 CEST 2015
Org: 2.1800899147740388E-6 check:9.980780136716263E-5

that indicates that my gradient calculation has an error, or the gradientchecking() method. I m not sure, can somebody help me?

Ibraim Ganiev · Answer 1 · 2015-07-28T16:30:54.120

1

In Java arrays are reference types.

int[] arr = { 8,7,6,5,4,3,2,1,8};
int[] b = arr;
b [0] = -10;
for (int i:arr) {
    System.out.print (' ');
    System.out.print (i);
}

outputs -10 7 6 5 4 3 2 1 8

So i mean that you incorrectly creating arrays

double[] theta_pos = theta;
double[] theta_neg = theta;

they are just references to theta, and by changing their contents you change theta also, +mu-mu = 0. Use clone() methods while copying array.

double[] theta_pos = theta.clone();
double[] theta_neg = theta.clone();

But remember that clone may not work as you expecting in some cases, with simple(non-reference) types it works ideal. Look at this Does calling clone() on an array also clone its contents?

edited Jul 28 '15 at 16:30

answered Jul 28 '15 at 16:11

Ibraim Ganiev

8,934
3
33
52

I changed it accordingly, but results are also not as expected – user3352632 Jul 30 '15 at 20:59
Do you always have 20 elements in second loop ? Better use "k < theta.length". Also try 1e-4 instead of 1e-5, and you can try to assign values theta_pos[k] = theta[k] and theta_neg[k]=theta[k] at the end of second for loop. After all maybe you have error in gradient function itself. – Ibraim Ganiev Jul 30 '15 at 21:35
Also do you know differences between cost function and gradient of it? Because i think that you are trying to compare them. You need to compare k-th partial derivative of the costfunction (not costfunction itself) in point theta and your checking value. – Ibraim Ganiev Jul 30 '15 at 21:47

Is my gradient checking method correct and my gradient calculation wrong, or vice versa?

1 Answers1