3

I have implemented Logistic Regression with Gradient Descent in Java. It doesn't seem to work well (It does not classify records properly; the probability of y=1 is a lot.) I don't know whether my implementation is correct.I have gone through the code several times and i am unable to find any bug. I have been following Andrew Ng's tutorials on Machine learning on Course Era. My Java implementation has 3 classes. namely :

  1. DataSet.java : To read the data set
  2. Instance.java : Has two members : 1. double[ ] x and 2. double label
  3. Logistic.java : This is the main class that implements Logistic Regression with Gradient Descent.

This is my cost function:

J(Θ) = (- 1/m ) [Σmi=1 y(i) log( hΘ( x(i) ) ) + (1 - y(i) ) log(1 - hΘ (x(i)) )]

For the above Cost function, this is my Gradient Descent algorithm:

Repeat (

Θj := Θj - α Σmi=1 ( hΘ( x(i)) - y(i) ) x(i)j

(Simultaneously update all Θj )

)

import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;

public class Logistic {

    /** the learning rate */
    private double alpha;

    /** the weight to learn */
    private double[] theta;

    /** the number of iterations */
    private int ITERATIONS = 3000;

    public Logistic(int n) {
        this.alpha = 0.0001;
        theta = new double[n];
    }

    private double sigmoid(double z) {
        return (1 / (1 + Math.exp(-z)));
    }

    public void train(List<Instance> instances) {

    double[] temp = new double[3];

    //Gradient Descent algorithm for minimizing theta
    for(int i=1;i<=ITERATIONS;i++)
    {
       for(int j=0;j<3;j++)
       {      
        temp[j]=theta[j] - (alpha * sum(j,instances));
       }

       //simulataneous updates of theta  
       for(int j=0;j<3;j++)
       {
         theta[j] = temp[j];
       }
        System.out.println(Arrays.toString(theta));
    }

    }

    private double sum(int j,List<Instance> instances)
    {
        double[] x;
        double prediction,sum=0,y;


       for(int i=0;i<instances.size();i++)
       {
          x = instances.get(i).getX();
          y = instances.get(i).getLabel();
          prediction = classify(x);
          sum+=((prediction - y) * x[j]);
       }
         return (sum/instances.size());

    }

    private double classify(double[] x) {
        double logit = .0;
        for (int i=0; i<theta.length;i++)  {
            logit += (theta[i] * x[i]);
        }
        return sigmoid(logit);
    }


    public static void main(String... args) throws FileNotFoundException {

      //DataSet is a class with a static method readDataSet which reads the dataset
      // Instance is a class with two members: double[] x, double label y
      // x contains the features and y is the label.

        List<Instance> instances = DataSet.readDataSet("data.txt");
      // 3 : number of theta parameters corresponding to the features x 
      // x0 is always 1   
        Logistic logistic = new Logistic(3);
        logistic.train(instances);

        //Test data
        double[]x = new double[3];
        x[0]=1;
        x[1]=45;
        x[2] = 85;

        System.out.println("Prob: "+logistic.classify(x));


    }
}

Can anyone tell me what am I doing wrong? Thanks in advance! :)

Srinidhi Shankar
  • 311
  • 5
  • 15
  • 2
    I think you need to begin by deciding whether you have a Java problem or a machine learning problem. Does your Java program implement the intended function, regardless of whether it is the right function? You should be able to tell that from unit tests. – Patricia Shanahan Mar 03 '15 at 05:09
  • 1
    You implemented gradient *ascend*, not descent. also you need to divide the sum by the number of instances you processed- that's the reason why you weights are exploding. – Thomas Jungblut Mar 03 '15 at 10:49
  • @Thomas Sorry for that. I was trying out different things and I forgot to change it back to minus. I did the edit. Even if it's minus, it's not working as expected. – Srinidhi Shankar Mar 03 '15 at 10:53
  • But you did read my previous comment until the end, right?:P – Thomas Jungblut Mar 03 '15 at 11:05
  • yes i did. But the gradient descent algorithm i mentioned above doesn't involve dividing. Do you want me to divide the entire sum by the total number of records processed? which is instances.size() in my case? – Srinidhi Shankar Mar 03 '15 at 11:24
  • Of course it does involve dividing `(- 1/m )`. – Thomas Jungblut Mar 03 '15 at 11:32
  • I agree that the cost function involves dividing by (-1/m). But after applying Gradient Descent ,that is after taking the derivative of the cost function with respect to Θ, it doesn't have (-1/m). Θj := Θj - α Σ( hΘ( x(i)) - y(i) ) x(i)j – Srinidhi Shankar Mar 03 '15 at 11:46
  • 2
    Maybe you want to have a look at some more idiomatic python code: http://stackoverflow.com/questions/17784587/gradient-descent-using-python-and-numpy/17796231#17796231 – Thomas Jungblut Mar 03 '15 at 11:57
  • i divided the sum by m. That is, in the function sum, i returned (sum/instances.size()). But now , all the records are being classified as Positive. There is no accuracy at all. – Srinidhi Shankar Mar 03 '15 at 12:48

1 Answers1

1

As I am studying logistic regression, I took the time to review your code in detail.

TLDR

In fact, it appears the algorithm is correct.

The reason you had so much false negatives or false positives is, I think, because of the hyper parameters you choose.

The model was under-trained so the hypothesis was under-fitting.

Details

I had to create the DataSet and Instance classes because you did not publish them, and set up a train data set and a test data set based on the Cryotherapy dataset. See http://archive.ics.uci.edu/ml/datasets/Cryotherapy+Dataset+.

Then, using your same exact code (for the logistic regression part) and by choosing an alpha rate of 0.001 and a number of iterations of 100000, I got a precision rate of 80.64516129032258 percent on the test data set, which is not so bad.

I tried to get a better precision rate by tweaking manualy those hyper parameters but could not obtain any better result.

At this point, an enhancement would be to implement regularization, I suppose.

Gradient descent formula

In Andrew Ng's video about the the cost function and gradient descent, it is correct that the 1/m term is omitted. A possible explanation is that the 1/m term is included in the alpha term. Or maybe it's just an oversight. See https://www.youtube.com/watch?v=TTdcc21Ko9A&index=36&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&t=6m53s at 6m53s.

But if you watch Andrew Ng's video about regularization and logistic regression you'll notice that the term 1/m is clearly present in the formula. See https://www.youtube.com/watch?v=IXPgm1e0IOo&index=42&list=PLLssT5z_DsK-h9vYZkQkYNWcItqhlRJLN&t=2m19s at 2m19s.

Daishi
  • 12,681
  • 1
  • 19
  • 22