1

I used JupyterLab to preprocess a larger set of text documents with spaCy. While there's overall no problem, I've noticed that there's a huge speed difference when I use different conda kernels / virtual environments. The difference is about 10x.

Both environments have the same version of spaCy and NumPy installed; also both using the same Python version (3.9.15).

numpy                   1.23.4          py39h14f4228_0
spacy                   3.3.1           py39h79cecc1_0

so I cannot tell where the speed difference might come from. Maybe it's from another package that spaCy requires?

I also converted the notebooks into .py scripts and running from the console, but the same results: In one virtual environment it runs about 10x slower.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Christian
  • 3,239
  • 5
  • 38
  • 79
  • Can you check what BLAS/LAPACK libraries are installed in each environment? Differences there might matter (though not usually a whole order of magnitude). E.g., [having `mkl` vs `netlib`](https://stackoverflow.com/a/70241376/570918). – merv Dec 17 '22 at 06:23
  • 1
    Both environments have `mkl 2021.4.0 h06a4308_64` installed, if case this clarifies things. I assume `mkl` when installing `numpy` or something. The "slow" environment I created just 2 days ago and I'm sure it didn't install it explicitly. – Christian Dec 17 '22 at 07:18
  • 1
    I looked through `conda list` again and noticed that `cupy` was not yet installed in the "slow" environment, and I indeed use a GPU. Now I see the same performance using either environment. – Christian Dec 17 '22 at 07:30

1 Answers1

1

The "slow" environment was missing cupy. After installing it, spaCy shows the same performance in both environments.

Christian
  • 3,239
  • 5
  • 38
  • 79