4

How is it possible to reduce the size of a python virtual environment?

This might be:

What else can be removed or stripped down? Or are there other way?

The use case is for example the upload of the virtualenv to a server with limited space (e.g. AWS Lambda function with 512 MB limit)

Rene B.
  • 6,557
  • 7
  • 46
  • 72
  • "Removing packages from site_packages but which one can be removed?" => the ones you don't use (directly or as a dependency of another one). But if you correctly maintains your requirements file and venv, you should already only have required packages here. – bruno desthuilliers Jul 09 '19 at 14:18
  • @brunodesthuilliers agree. But there are also packages in site_packages that are installed by default such as the `future` package that I never installed explicitly and i dont use in the code. How to find out which of these are used by the code? – Rene B. Jul 09 '19 at 14:20
  • "Removing the *.pyc files" => it's useless, they will be recreated on the first import - assuming your venv directory is writable of course (and if it isn't, your app's startup time will be quite longer as the runtime will have to compile them on the fly for each process). – bruno desthuilliers Jul 09 '19 at 14:20
  • Deleting .pyc files doesn't make sense for any job other than transfering files: they will be recreated upon first code run. And for packages, it comes down to trying to delete some files and test the app - we can't know. – ipaleka Jul 09 '19 at 14:21
  • 1
    The fact _you_ don't use a package doesn't mean it's not used by another of your packages... as to detect stale packages, you may want to have a look at pipdeptree (https://pypi.org/project/pipdeptree/) – bruno desthuilliers Jul 09 '19 at 14:24
  • I found something like this maybe its possible to find the usages packages/ files during runtime: https://medium.com/@mojodna/slimming-down-lambda-deployment-zips-b3f6083a1dff – Rene B. Jul 09 '19 at 14:24
  • 2
    If you want to sink Lambda package zip, you can use AWS Lambda Layer. AWS Lambda Layers You can configure your Lambda function to pull in additional code and content in the form of layers. A layer is a ZIP archive that contains libraries, a custom runtime, or other dependencies. With layers, you can use libraries in your function without needing to include them in your deployment package. Layers let you keep your deployment package small, which makes development easier. You can avoid errors that can occur when you install and package dependencies with your function code. – Nirmal Jul 09 '19 at 17:16
  • 1
    Do you have boto3, botocore, etc. removed from your zip? They are already available in AWS Lambdas. – Jan Giacomelli Jul 12 '19 at 12:34
  • @Nirmal if I am not wrong then adding new layers does not increase the space in a lambda function as the limit is including the layers. So 512MB including the layers. So this is not a solution. – Rene B. Jul 16 '19 at 07:17
  • @giaco yes this is also a good point! I did this already. – Rene B. Jul 16 '19 at 07:18
  • May I ask which libraries do you use to have such a large zip? We managed to stay within limits although we are using OpenCV, scikit, numpy, tesseract, ... But we decided to use stdlib as much as possible. For example, we are using urllib3 instead of requests. Do you maybe have added libraries such as coverage, pylint, ...? – Jan Giacomelli Jul 16 '19 at 12:17
  • 1
    @giaco we are using spacy which is very big in combination with mlflow, scikit-learn, etc. In Spacy it might be possible to remove "lang" which is around 240 MB but it would be good to know which are further possibilities. – Rene B. Jul 16 '19 at 13:35
  • You could pack your dependencies to separated zip and download from S3(also extract from zip and add the destination to the path) them on a cold start of lambda (code before your handler function definition). You can store up to 512 MB in /tmp/ directory. Do you really need all of this libraries in single Lambda? Could you maybe split your process to more steps (Lambdas) from which everyone of them would need just part of libraries? Then you can orchestrate them as step Functions or use SNS for example. – Jan Giacomelli Jul 18 '19 at 10:29
  • @giaco the first part is interesting it is how zappa is doing it. the second approach also sounds interesting that would be possible. But it might reduce the response time if two lambda functions need to be called and both need to load external files. – Rene B. Jul 18 '19 at 10:50
  • @ReneB it all depends on your use case. But as I discovered so far, AWS limits forced us to build smarter systems. You can contact me if you need any help. – Jan Giacomelli Jul 19 '19 at 06:43
  • 1
    @ReneB Here is a cool blog on reducing space usage of packages : https://towardsdatascience.com/how-to-shrink-numpy-scipy-pandas-and-matplotlib-for-your-data-product-4ec8d7e86ee4 – ASHu2 Dec 18 '19 at 05:00

3 Answers3

4

If there is a .pyc file, you can remove the .py file, just be aware that you will lose stack trace information from these files, which will most likely mess up any error/exception logging you have.

Apart from that there is no universal way of reducing the size of a virtualenv - it will be highly dependent on the packages you have installed, and you will most likely have to resort to trial and error or reading source code to figure out exactly what you can remove.

The best you can do is look for the packages that take up the most space and then further investigate the ones that take up the most disk space. On a *nix system with the standard coreutil commands available, you can run the following command:

du -ha /path/to/virtualenv | sort -h | tail -20
Andreas
  • 7,991
  • 2
  • 28
  • 37
2

After you installed all packages you could try to remove all packages in the virutalenv that are related to installing packages.

rm -r pip*
rm -r pkg_resources*
rm -r setuptools*

Depending on which packages you have installed, the result might still work as desired since most packages wont have runtime dependencies on these three packages. Use at your own risk.

Maarten Derickx
  • 1,502
  • 1
  • 16
  • 27
0

When you create your virtualenv you can tell it to use your system site_packages. If you installed all required packages globally on the system, when you created your virtualenv, it would essentially be empty.

$ pip install package1 package2 ...
$ virtualenv --system-site-packages venv
$ source venv/bin/activate
(venv) $ # now you can use package1, package2, ...

With this method you can overinstall a package. If, inside your virtualenv, you install a package, that will be used instead of whatever is on the system.

Peter Hull
  • 6,683
  • 4
  • 39
  • 48
blueteeth
  • 3,330
  • 1
  • 13
  • 23
  • this would mean to "outsource the packages to the system. However, I would need to relocate the virtualenv to another system that might not have the packes installed and where no root permission are available. – Rene B. Dec 22 '19 at 18:45
  • Relocating the virtualenv to another system won't work anyway, because virtualenvs contain system specific paths. – blueteeth Dec 22 '19 at 18:46
  • Its no problem to relocated a virtualenv with `virtualenv --relocatable my-venv`. I did it many times. Check it out here: https://stackoverflow.com/questions/32407365/can-i-move-a-virtualenv – Rene B. Dec 22 '19 at 19:06