0

I'm using doc2vec embeddings of amino acid sequences to try and predict kinetic rate.

I've tried both standardising and not standardising my input vectors (X) but unless I standardise my output variable (kinetic rates), my GP model predicts very similar numbers for all the test inputs (between 4.87 and 4.9)?

Are you supposed to standardise your output values or is there something wrong with my model?

I'm using the GPy package in Python.

This is my code:

#GP Regression for word vectors
def Gp_regression(Xtrain, Ytrain, Xtest, Ytest):

    kernel = GPy.kern.RBF(input_dim = 64, variance = 1, lengthscale =    1)
    m = GPy.models.GPRegression(Xtrain, Ytrain, kernel=kernel,    noise_var=1e-10)
    m.optimize_restarts(num_restarts = 10)

    Xtest = np.ndarray(shape=(1,64))
    mean = m.predict(Xtest)

return mean
gehbiszumeis
  • 3,525
  • 4
  • 24
  • 41
  • wow, that's one incredibly domain specific question, with very few details to help non-stats people! including some of the model and data would help, but I expect they are both large and difficult to post. there's so much you've left out of the question that it's difficult to say much – Sam Mason Jul 11 '19 at 15:03
  • Sorry I wasn't sure how much detail to give, I've never posted on here before haha. I have vectors of dimension 64 and I'm trying to predict the log kinetic rates using GP regression that correspond to each vector (between 3.06 and 7.18). If I standardise the log kinetic rates, my model predicts pretty well but if I dont standardise them it predicts pretty much the same value for all the input vectors. I haven't done much stats for a long time so I'm pretty new to whole idea of GP Regression so I'm not entirely sure what could lead to the model predicting the same values. – Hayley Smith Jul 11 '19 at 15:24
  • why are you replacing the `Xtest` variable with an uninitialized `ndarray`? – Sam Mason Jul 11 '19 at 15:59
  • Thank you! I definitely shouldn't be doing that! I'm ending up giving the model the same vector over and over which is why the predicted values are all the same. – Hayley Smith Jul 12 '19 at 11:35
  • So I changed that so it's definitely taking different vectors but the issue is still the same. When I print my model the objective value is positive (Objective: 87.35531276527225). Any example I've seen online has a negative value for objective? – Hayley Smith Jul 12 '19 at 12:38
  • when you say "Objective" do you mean the log-likelihood? whether it's positive or negative shouldn't matter and will depend on the data and model. [GPML](http://www.gaussianprocess.org/gpml/chapters/RW.pdf) used to be the standard text on this but it might be a bit theoretical for your tastes, there are probably more applied ones in your area – Sam Mason Jul 12 '19 at 12:51

0 Answers0