2

I am trying to fit some random data to a GP with the RBF kernel, using the GPy package. When I change the active dimensions, I get the LinAlgError: not positive definite, even with jitter error. This error is generated only with a conda environment. When I use pip, I have never run into this error. Has anyone come across this?

import numpy as np
import GPy
import random

def func(x):
      return np.sum(np.power(x, 5) - np.power(x, 3))
    
# 20 random data with 10 dimensions
random.seed(2)
random_sample = [[random.uniform(0,3.4) for i in range(10)] for j in range(20)]

# get the first random sample as an observed data 
y = np.array([func(random_sample[0])])
X = np.array([random_sample[0]])
y.shape = (1, 1)
X.shape = (1, 10)

# different set of dimensions
set_dim = [[np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])],
           [np.array([0, 1]), np.array([2, 3]), np.array([4, 5]), np.array([6, 7]), np.array([8, 9])],
           [np.array([0, 1, 2, 3, 4]), np.array([5, 6, 7, 8, 9])],
           [np.array([0, 1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]]


for i in range(len(set_dim)):
    # new kernel based on active dims
    k = GPy.kern.Add([GPy.kern.RBF(input_dim=len(set_dim[i][x]), active_dims=set_dim[i][x]) for x in range(len(set_dim[i]))])
    
    # increase data set with the next random sample
    y = np.concatenate((y, np.array([[func(random_sample[i+1])]])))
    X = np.concatenate((X, np.array([random_sample[i+1]])))

    model = GPy.models.GPRegression(X, y, k)
    model.optimize()

The output of conda list for gpy, scipy and numpy. Output of conda list for gpy, scipy and numpy

The paths of the above packages. paths

gehbiszumeis
  • 3,525
  • 4
  • 24
  • 41
Katerina
  • 21
  • 3

1 Answers1

0

Possible Channel-Mixing Issue

Sometimes package builds from across different channels (e.g., anaconda versus conda-forge) are incompatible. The times I've encountered this, it happened when compiled symbols were referenced across packages, and the different build stacks used on the channels used different symbol names, leading to missing symbols when mixing.

I can report that using the exact same package versions as OP, but prioritizing the Conda Forge channel builds, gives me reliable behavior. While not conclusive, this would be consistent with the issue somehow coming from the mixing of the Conda Forge build of GPy with otherwise Anaconda builds of dependencies (e.g., numpy, scipy). Specifically suggestive is the fact that I have the exact same GPy build and that module is where the error originates. At the same time, there is nothing in the error that immediately suggests this is a channel mixing issue.

Workaround

In practice, I avoid channel mixing issues by always using YAML definitions to create my environments. This is a helpful practice because it encourages one to explicitly state the channel priority as part of the definition and it makes Conda aware of your preference from the outset. The following environment definition works for me:

gpy_cf.yaml

name: gpy_cf
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.6
  - gpy=1.9.6
  - numpy=1.16.2
  - scipy=1.2.1

and using

conda env create -f gpy_cf.yaml
conda activate gpy_cf

Unless you really do need these exact versions, I would remove whatever versioning constraints are unnecessary (at the very least remove the patches).


Broken Version

For the record, this is the version that I can replicate the error with:

gpy_mixed.yaml

name: gpy_mixed
channels:
  - defaults
  - conda-forge
dependencies:
  - python=3.6
  - conda-forge::gpy=1.9.6
  - numpy=1.16.2
  - scipy=1.2.1

In this case, we force gpy to come from Conda Forge and let everything else source from the Anaconda (defaults) channel, similar to the configuration found in OP.

merv
  • 67,214
  • 13
  • 180
  • 245
  • Have you tried running it a couple of times? I did what you suggested, and it worked, but when I tried to rerun it, I got the same error. – Katerina Jan 26 '21 at 13:25
  • @Katerina yes, I had it loop 100 times. The mixed one crashes, the Conda Forge one does not. – merv Jan 26 '21 at 13:52
  • @Katerina when using the **gpy_cf.yaml**, can you verify that all the packages are coming from Conda Forge? (I.e., check `conda list`) – merv Jan 26 '21 at 17:36
  • Hi @merv, I have done what you suggested, but scipy's channel is not conda-forge even if I install it independently using ```conda install -c conda-forge scipy``` – Katerina Jan 27 '21 at 15:39
  • You may need to set the priority configuration variable: `conda config --set channel_priority 'strict'`. However, make sure to review documentation (`conda config --describe channel_priority`) to see if you really want this setting for your configuration. Also, the `--channels|-c` argument only adds the channel, but does not require using it. To tell Conda to install a package from a specific channel, the command would be `conda install conda-forge::scipy`. – merv Jan 27 '21 at 15:51
  • I've managed to install all packages from conda forge, but unfortunately, it didn't solve the error. – Katerina Jan 28 '21 at 15:23