2

I tried to run the code from the notebook on self generated data, to prove if the model will do any classification. https://gpflow.readthedocs.io/en/master/notebooks/basics/classification.html

So I created X and Y as input data.

X=np.array([-0.0259,-0.3579,-0.289,0.0356,0.0147,0.0234]).reshape(-1,1)
Y=np.array([0,0,0,1,1,1]).reshape(-1,1)

The value in X and Y were chosen as binary logic, negative value in X is equal to 0 in Y. And positive value in X should be classified as 1 in Y.

Then I created a model and trained it:

Per = gpflow.kernels.Periodic(gpflow.kernels.SquaredExponential())
model_Per = gpflow.models.VGP((X, Y), likelihood=gpflow.likelihoods.Bernoulli(), kernel=Per)

I tried to predict Y as class with the same X that was used as input for the model training, wanted just to see, if there is the right result.

Ypred, VARpred = model_Per.predict_y(X)

For Ypred I get the output:

    <tf.Tensor: shape=(6, 1), dtype=float64, numpy=
array([[0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5],
       [0.5]])>

For the VARpred

   <tf.Tensor: shape=(6, 1), dtype=float64, numpy=
array([[0.25],
       [0.25],
       [0.25],
       [0.25],
       [0.25],
       [0.25]])>

I tried, to change the kernel, to combine the kernels, to make an optimization with Scipy before predicting, changed the data, but always the same output for mean and variance. I was expecting, the Ypred = Y with this data set.

What am I doing wrong creating this classification model?

joel
  • 6,359
  • 2
  • 30
  • 55
Vadim5
  • 21
  • 1
  • can you show how you trained it with Scipy? – joel Jul 04 '20 at 01:24
  • I used `opt = gpflow.optimizers.Scipy() opt.minimize(model_Per.training_loss, variables=model_Per.trainable_variables)` – Vadim5 Jul 04 '20 at 05:57
  • this is the standard optimizer in the docs notebook – Vadim5 Jul 04 '20 at 05:58
  • After optimization i got `` – Vadim5 Jul 04 '20 at 05:58
  • It's better then 0.5 for each X inm the list, but it's not 1 or 0. Is there any other merthod to pwerform Classification on 1-dimensional data? Or other optimization method that could be recommended? – Vadim5 Jul 04 '20 at 06:01
  • I would like to test the classification on the data with 800 or even 8000 elements for X and Y, is there a better way in GPflow? – Vadim5 Jul 04 '20 at 06:03

1 Answers1

0

You have to actually optimise your model. Once you optimise it, the results actually look very reasonable. I would not expect a GP model to exactly predict p=1 -- this would mean 0.0% probability of ever observing a 0 at this point, which I would only believe if I had seen an infinite amount of data all saying 1...

For the Bernoulli likelihood you are using, the variance is deterministically related to the mean. If y ~ Bernoulli, and Mean[y] = p, then Var[y] = p * (1 - p). For you, the mean is p=0.5, so the variance is 0.5 * (1 - 0.5) = 0.25.

STJ
  • 1,478
  • 7
  • 25