20

I try to create an exe file using PyInstaller 3.2.1, for test purpose I tried to make an exe for following code:

import pandas as pd
print('hello world')

After considerable amount of time (15mins +) I finished with dist folder as big as 620 MB and build - 150 MB. I work on Windows using Python 3.5.2 |Anaconda custom (64-bit). Might be worth noting that in dist folder mkl files are responsible for almost 300 MB. I run pyinstaller using 'pyinstaller.exe foo.py'. I tried using --exclude-module to exclude some dependencies, still ended up with huge files. Whether I use onefile or onedir doesn't make any difference.

I am aware that exe must contain some important files but is it normal to be as big as almost 1 GB? I can provide warning log if necessary or anything that could be helpful to solve the matter.

P.S. In parallel my coworker created an exe from same sample script and ended up with less than 100 MB, difference is he is not using anaconda. Could that be the matter?

Any help will be appreciated.

AMC
  • 2,642
  • 7
  • 13
  • 35
dylan_fan
  • 680
  • 1
  • 5
  • 18

8 Answers8

22

PyInstaller creates a big executable from conda packages and a small executable from pip packages. From this simple python code:

from pandas import DataFrame as df
print('h')

I obtain a 203MB executable using conda packages and a 30MB executable using pip packages. But conda is a nice replacement for pure virtualenv. I can develop with conda and Jupyter, create some script 'mycode.py' (I can download Jupyter notebook as py-file in myfolder).

But my final solution is next: If you do not have it, install Miniconda and from the Windows Start Menu open Anaconda Prompt;

    cd myfolder
    conda create -n exe python=3
    activate exe
    pip install pandas pyinstaller pypiwin32
    echo hiddenimports = ['pandas._libs.tslibs.timedeltas'] > %CONDA_PREFIX%\Lib\site-packages\PyInstaller\hooks\hook-pandas.py
    pyinstaller -F mycode.py

Where I create a new environment 'exe', pypiwin32 is needed for pyinstaller but is not installed automaticaly, and hook-pandas.py is needed to compile with pandas. Also, importing submodules does not help me optimize the size of the executable file. So I do not need this thing:

from pandas import DataFrame as df

but I can just use the usual code:

import pandas as pd

Also, some errors are possible along using the national letters in paths, so it is nice the english user account for development tools.

Community
  • 1
  • 1
abicorios
  • 439
  • 4
  • 5
  • 4
    Thanks a bunch, this worked to take my 600+ mb dist folder to 80MB! I further download UPX from https://github.com/upx/upx/releases and supplied the folderpath as `--upx-dir=C:\upx394w` and it decreased things even further to 28MB. So, 600MB to 28MB. Not bad! – Nikhil VJ Apr 24 '18 at 18:59
  • 1
    Filed an issue with conda regarding this : https://github.com/conda-forge/numpy-feedstock/issues/84 – Nikhil VJ Apr 28 '18 at 15:33
  • 1
    Got a recco for folks who still want to use conda for installing packages: use `conda install -c conda-forge numpy` to avoid `mkl`, it'll use an `OpenBLAS` package in its place. This is also recommended for distributions as mkl isn't fully open-license. See https://github.com/conda-forge/numpy-feedstock/issues/84 – Nikhil VJ Apr 29 '18 at 01:54
  • @nikhilvj I'm a bit confused by conda-forge solution. When I install numpy with "conda-forge" I get the openblas, but later when I go to install pandas (still with conda-forge) it installs the mkl. What is the right way to keep OpenBLAS and Pandas? – rhaskett May 24 '18 at 19:07
  • 1
    @rhaskett ok.. pls file a request on https://github.com/conda-forge/pandas-feedstock/issues then, ask them for a way to install pandas without mkl. Reference the numpy issue I linked here. And if that too fails then just go with `pip` as this answer recommends. – Nikhil VJ May 25 '18 at 04:42
  • 1
    Not a big deal, but i thought I was missing something obvious. – rhaskett May 25 '18 at 19:15
  • 1
    Can you update this answer? Do you mind if I do so myself? – AMC Apr 16 '20 at 00:33
11

This is probably because the Anaconda version of numpy is built using mkl.

If you want to reduce the size of the distributable, you could work with a seperate building virtual environment with the packages installed through pip instead of conda

Maarten Fabré
  • 6,938
  • 1
  • 17
  • 36
  • I was thinking about virtualenv, you had any experience with building exe while using virtualenv ? – dylan_fan May 10 '17 at 08:51
  • 1
    I was looking in building and exe myself some time ago, but I couldn't get it to work easily and it became less urgent to share the stuff I had with non-python colleagues, so only negative. But I didn't put too much effort in in due to other priorities at the time – Maarten Fabré May 10 '17 at 08:53
  • Got it, unfortunately I have to share the script and installing python environment for each user is not going to be possible. sigh I will probably give a shot for virtualenv – dylan_fan May 10 '17 at 08:58
  • Let us know how it went. I'm still interested in this, but don't have too much time for it at the moment – Maarten Fabré May 10 '17 at 09:07
  • 1
    I dug in a bit regarding the mystery `mkl` package and have a hunch it's been included by conda by accident. It's not even a free license package, doesn't have any description on pypi. I've filed an issue with conda regarding this : https://github.com/conda-forge/numpy-feedstock/issues/84 – Nikhil VJ Apr 28 '18 at 15:35
  • workaround given by a conda-forge dev: `conda install -c conda-forge numpy`. Avoids `mkl`, uses an `OpenBLAS` package in its place. – Nikhil VJ Apr 29 '18 at 02:18
  • related: https://stackoverflow.com/a/52899702 – djvg Feb 14 '22 at 09:47
8

Here's a way to still be using conda and avoid mkl. Install numpy before installing pandas with this alternate command:
conda install -c conda-forge numpy

Avoids mkl, uses an OpenBLAS package in its place. Full explanation in this issue at conda/conda-forge/numpy-feedstock github repo.

Nikhil VJ
  • 5,630
  • 7
  • 34
  • 55
  • This should be the accepted answer, see also [link](https://github.com/conda-forge/numpy-feedstock/issues/84). – balletpiraat Oct 10 '18 at 11:35
  • this still includes mkl package – erotavlas Oct 29 '19 at 13:23
  • @erotavlas too bad. Maybe one of the other solutions posted on https://github.com/conda-forge/numpy-feedstock/issues/84 by others can help? – Nikhil VJ Oct 29 '19 at 13:41
  • 1
    @NikhilVJ I solved it after by using pip install in my conda environment I guess the conda version includes mkl instead of openblas – erotavlas Oct 29 '19 at 15:37
  • @erotavlas _this still includes mkl package_ It shouldn't, can you provide some more information? _I solved it after by using pip install in my conda environment_ Be careful, using pip in a Conda environment has its quirks. _I guess the conda version includes mkl instead of openblas_ Conda is just a package manager, there is no such thing as "the conda version". – AMC Apr 16 '20 at 00:29
4

A simple solution while working with Anaconda:

-Make a new environment inside Anaconda Navigator. (The new environment is free from the large amounts of packages that are causing the problem.)

-Open a terminal and use pipinstall to include the packages you need. ( Make sure it is in the new environment)

-Run pyinstaller.

I reduced my .exe from 300 MB to 30 MB.

Matt Pengelly
  • 1,480
  • 2
  • 20
  • 34
JSBY
  • 131
  • 10
3

I have the Anaconda 3.5.5 build for Python on Windows 10 and was also getting excessively large executables using the Anaconda distribution.

I was able to correct this by doing the following:

  1. First create a virtual environment (forums suggest virtualenv, but this gave me problems so instead I used venv)

    python -m venv C:/Python/NewEnv
    

This creates a virtual environment inside C:/Python/NewEnv with base python, pip and setuptools

  1. Next switch to the newly created environment

    C:/Python/NewEnv/Scripts/activate
    

You'll know that the environment is different as your command prompt will be prefaced with your new environment name (NewEnv)

  1. Install numpy first, then scipy, then pandas

    pip install numpy==1.13.3
    pip install scipy==1.1.0
    pip install pandas==0.18.1
    pip install pypiwin32==223
    pip install pyinstaller==3.2
    

I had to use these versions as I've tried different ones, but any later version of pandas were giving me further issues.

  1. Once these have been installed you can compile your program

    C:/Python/NewEnv/Scripts/pyinstaller --onefile program.py
    
  2. This will create a .spec file, which you'll need to modify with this version of pandas and pyinstaller to add hidden imports otherwise loading pandas from the executable will fail (Not sure if there's a pyinstaller command to just create the spec file, but if there is then rather do that - see ammendment#1)

There will be a hidden imports line inside the newly created .spec file:

    hiddenimports=[],

Change this to add pandas._libs.tslibs.timedeltas

    hiddenimports=['pandas._libs.tslibs.timedeltas'],
  1. Then you can compile your program again against the .spec file

    C:/Python/NewEnv/Scripts/pyinstaller --onefile program.spec
    

Note that this will install the program in whichever directory you are in so change directories before executing pyinstaller.

Ammendmend#1: I see that it's possible to add the hook-pandas.py to the Pyinstaller hooks. So after you install pyinstaller in the new environment, run

    echo hiddenimports = ['pandas._libs.tslibs.timedeltas'] > C:\Python\NewEnv\Lib\site-packages\PyInstaller\hooks\hook-pandas.py
Mitchell_h
  • 39
  • 3
1

I had a similar problem and found a solution. I used Windows terminal preview. This program allows creation of various virtual environments like Windows Power Shell (btw. Linux Ubuntu too. Also, worth noting: you can have many terminals in this program installed and, even, open a few at once. Very cool stuff).

Inside Windows Power Shell in Windows terminal preview I installed all the necessary libraries (like for example re, pandas, numpy, etc), then I opened the path to my file and tried to use this command:

pyinstaller --onefile -w 'filename.py'

...but, the output exe didn't work. For some reason, the console said that there is a lack of one library (which I had installed earlier). I've found the solution in mimic the auto-py-to-exe library. The command used by this GUI is:

pyinstaller --noconfirm --onedir --console "C:/Users/something/filename.py"

And this one works well. I reduced the size of my output exe program from 911MB to 82,9MB !!!

BTW. 911MB was the size of output made by auto-py-to-exe.

I wonder how is it possible that no one yet has created a compressor that reads the code, checks what libraries are part of the code, then putting only them inside the compression. In my case, auto-py-to-exe probably loaded all libraries that I ever installed. That would explain the size of this compressed folder.

Some suggest using https://virtualenv.pypa.io/en/stable/ but in my opinion, this library is very difficult, at least for me.

Paweł Pedryc
  • 368
  • 2
  • 5
  • 19
0

I created an executable file within a virtual environment. It did not help to reduce the app size. According to the closed issue QST: Pandas without MKL?, 'pandas does not use mkl directly, your issue is with pyinstaller.' Then I tried to make a standalone application using py2app (py2exe for Windows). As a result, the app takes 156 MB in contrast to 923 MB when using pyinstaller.

Piter
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 18 '21 at 18:07
-1

You need pure python environment, No Anaconda.

Because, it has too many useless packages. Install new python environment on another PC with as few package as possible!

Then try to use pyinstaller again. With this method, pyinstaller reduced the file from 200M to 8M.

PS: If you lack of some packages, you can pip install ...

Merlin
  • 24,552
  • 41
  • 131
  • 206
captain
  • 7
  • 3
  • Hi there, you can achieve the very same thing by just creating virtual environment, I completely agree that we want to keep package number as low as possible! – dylan_fan Jun 05 '18 at 13:16
  • 1
    It is strong language to say Anaconda provides so many useless packages. If you want a minimal environment then you should use miniconda, or from Anaconda create your own minimal environment and start from there: `conda create -n minipy37 python=3.7` as an example. – IanSR Aug 28 '19 at 10:03
  • _You need pure python environment, No Anaconda._ Those aren't the only two alternatives, there's Miniconda, for example. – AMC Apr 16 '20 at 00:31