0

I have a Python package that includes large PyTorch model checkpoints. I try including those in my setup.py as

package_data = {'mypackage': ['model_weights/*', 'model_weights/sequential_models*']},

Now the problem is whenever I try to install from the source via pip install mypackage/ --no-cache-dir I get a MemoryError. I tried debugging with --verbose and realized that this happens at

creating '/tmp/pip-wheel-bs29bp6a/tmpp0itbxn1/mypackage-1.0-py3-none-any.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'mypackage/model_weights/distilled_model.pt'
adding 'mypackage-1.0.dist-info/RECORD'
Traceback (most recent call last):
...
File "/zhome/1d/8/153438/miniconda3/envs/testenv/lib/python3.9/zipfile.py", line 1127, in write
      data = self._compressor.compress(data)
MemoryError
Building wheel for mypackage (PEP 517) ... error
ERROR: Failed building wheel for mypackage

I really only want the installation to copy over the files in model_weights/ to the installation directory. Including them in the wheel appears to be impossible.

Is there a way to suppress this step when running pip install? The package will only be distributed as a source, never on PyPI, as the model_weights files are far too large anyway.

Raphael
  • 517
  • 4
  • 18
fteufel
  • 59
  • 1
  • 5
  • 2
    I think from an architecture standpoint, it would be good to not include the models in the source. Kind of like you have with a lot of other ML and NLP packages. They contain the logic but not the data to act upon. Those need to be fetched be the user themself later when using the package. – The Fool Nov 02 '21 at 12:06
  • Any idea how to best implement that? I don't have the means to host them for caching on demand, that's why I include them with the package source for download. – fteufel Nov 02 '21 at 12:13
  • I'm not sure if it works for you, but in the past I've hosted large files like pre trained model weights in a GitHub repo and also on Google Drive. – Ravi Mashru Nov 02 '21 at 12:58
  • I would not recommend it, there is probably a better way, also maybe it does not work anymore nowadays, but maybe it helps for the short term: https://stackoverflow.com/a/58290113 – sinoroc Nov 02 '21 at 13:50

1 Answers1

5

You can run

$ pip install mypackage/ --no-cache-dir --no-binary=mypackage

to skip wheel building (assuming mypackage is actually your distribution name - this is what you pass as name to setup() function).

hoefling
  • 59,418
  • 12
  • 147
  • 194