1

I have prepared an inference pipeline for a Kaggle competition and it has to be executed without internet connection.

I'm trying to use different versions of transformers but I had some issues regarding the installation part.

Kaggle's default transformers version is 4.26.1. I start with installing a different branch of transformers (4.18.0.dev0) like this.

!pip install ./packages/sacremoses-0.0.53
!pip install /directory/to/packages/transformers-4.18.0.dev0-py3-none-any.whl --find-links /directory/to/packages

It installs transformers-4.18.0.dev0 without any problem. I use this version of the package and do the inference with some models. Then I want to use another package open_clip_torch-2.16.0 which is compatible with transformers-4.27.3, so I install them by simply doing

!pip install /directory/to/packages/transformers-4.27.3-py3-none-any.whl --no-index --find-links /directory/to/packages
!pip install /directory/to/packages/open_clip_torch-2.16.0-py3-none-any.whl --no-index --find-links /directory/to/packages/

I get a prompt of Successfully installed transformers-4.27.3 and open_clip_torch-2.16.0.

!pip list | grep transformers outputs transformers 4.27.3 but when I do

import transformers
transformers.__version__

the version is '4.18.0.dev0'. I can't use open_clip because of that reason. Some of the codes are breaking because it uses the old version of transformers even though I installed a newer version. How can I resolve this issue?

alvas
  • 115,346
  • 109
  • 446
  • 738
gunesevitan
  • 882
  • 10
  • 25

2 Answers2

2

When you initially import a module in a Python environment it is cached in sys.modules. Subsequent imports are not read from the disk but from the cache, for this reason you are not seeing the new version of the module being loaded.

import sys
import transformers
sys.modules['transformers'].__version__

A possible solution is to attempt to reload the module using importlib.reload.

import importlib
importlib.reload(transformers)
sys.modules['transformers'].__version__

Read the documentation so that you are aware of the caveats of using this method.

Dan Nagle
  • 4,384
  • 1
  • 16
  • 28
  • 1
    Yep, this worked... so it was about cache after all. However, this worked in a weird way. When I called importlib.reload(transformers), it throws an error and doesn't reload but I can do import transformers again and it is the new version. – gunesevitan Mar 29 '23 at 07:12
  • This solved the problem in the question so I'm accepting this as an answer but when I install open_clip after doing this and import it, it still uses the old transformers version. I tried reloading open_clip as well, but it didn't work. – gunesevitan Mar 29 '23 at 09:04
1

Following https://www.kaggle.com/code/samuelepino/pip-installing-packages-with-no-internet

  1. From https://pypi.org/project/transformers/#files, download transformers-4.27.3-py3-none-any.whl

  2. Upload .whl file to Kaggle as dataset

  3. ! pip install -U transformers --no-index --find-links=/kaggle/input/transformers-wheels

  4. Restart the kernel's runtime, with one of these tricks: https://stackoverflow.com/a/37993787/610569 or https://realpython.com/lessons/reloading-module/; from the comments, looks like the importlib.reload() works.

  5. Check the transformers version

alvas
  • 115,346
  • 109
  • 446
  • 738
  • This won't work either because I'm trying to ensemble predictions of multiple models. When I restart the kernel, I lose everything. I'll try without restarting the kernel. – gunesevitan Mar 29 '23 at 06:26
  • The use case is you have one model that needs transformers v4.18 and another model that needs v4.27? Is that right? How are you ensembling? Ensembling by combining output predictions (labels) or the model weights/hidden layer? – alvas Mar 29 '23 at 06:28
  • I think it's hard to debug and help you resolve the issue, without you sharing the code and the output you need to ensemble. Esp. when you want to do 2 different models' inferences in one kernel. – alvas Mar 29 '23 at 06:29
  • Does this work for you? (i) writing the first model's output, (ii) then install new transformers, (iii) then restart kernal in cell, (iv) load the 1st model's output, (v) then run 2nd model? If not, then you'll definitely have to share the code to how/what you're ensembling to debug your issue. – alvas Mar 29 '23 at 06:36
  • Yes, one model needs transformers 4.18 and another model needs 4.27. You are right it's hard to debug. The problem isn't about the installation though. I can see that 4.27 is installed without any problem. I can see transformers 4.27 from pip list. The problem is when I import transformers after installing 4.27, it still imports 4.18. – gunesevitan Mar 29 '23 at 06:37
  • Take a look at https://www.kaggle.com/code/alvations/installing-tranformers-w-o-internet?scriptVersionId=123786476 and see if it works for you. It's the most "powerful" module reload available and if that still fails, then you've to rethink how you want to ensemble or resolve dependencies conflicts for different models. – alvas Mar 29 '23 at 06:45
  • Unfortunately, that didn't work either. Maybe it could be related to those specific versions. – gunesevitan Mar 29 '23 at 06:50
  • Hmmm, I think it works for simple outputs/variables, see https://www.kaggle.com/code/alvations/installing-tranformers-w-o-internet?scriptVersionId=123787125 but when it comes to more complex outputs e.g. layer outputs/tensors, I'm not sure if the variables are kept. – alvas Mar 29 '23 at 06:51
  • The version of transformers didn't change when I did that. – gunesevitan Mar 29 '23 at 06:52
  • Ah, in the automatic runs, it's behaving differently from running it manually. – alvas Mar 29 '23 at 06:54
  • How about this? https://www.kaggle.com/code/alvations/installing-tranformers-w-o-internet?scriptVersionId=123787971 The difference is implicitly setting the reload flag – alvas Mar 29 '23 at 06:59
  • That didn't work either. Reloading transformers from the other answer worked though. – gunesevitan Mar 29 '23 at 07:25