-3

I can't help but wondering why ELF produced by Python is quite big compare to the original source code. Let's take a look at the simplest code, hello world.

user@linux:~/Python$ cat hello.py    
print('Hello, World!')
user@linux:~/Python$ 

Converting to ELF using pyinstaller

user@linux:~/Python$ pyinstaller -F hello.py 
48 INFO: PyInstaller: 3.4
49 INFO: Python: 3.6.7
50 INFO: Platform: Linux-4.15.0-38-generic-x86_64-with-Ubuntu-18.04-bionic
50 INFO: wrote /home/user/Python/hello.spec
53 INFO: UPX is not available.
54 INFO: Extending PYTHONPATH with paths
['/home/user/Python', '/home/user/Python']
55 INFO: checking Analysis
60 INFO: Building because _python_version changed
60 INFO: Initializing module dependency graph...
62 INFO: Initializing module graph hooks...
64 INFO: Analyzing base_library.zip ...
3061 INFO: running Analysis Analysis-00.toc
3096 INFO: Caching module hooks...
3100 INFO: Analyzing /home/user/Python/hello.py
3103 INFO: Loading module hooks...
3104 INFO: Loading module hook "hook-encodings.py"...
3169 INFO: Loading module hook "hook-pydoc.py"...
3170 INFO: Loading module hook "hook-xml.py"...
3388 INFO: Looking for ctypes DLLs
3388 INFO: Analyzing run-time hooks ...
3394 INFO: Looking for dynamic libraries
3632 INFO: Looking for eggs
3633 INFO: Python library not in binary dependencies. Doing additional searching...
3684 INFO: Using Python library /usr/lib/x86_64-linux-gnu/libpython3.6m.so.1.0
3695 INFO: Warnings written to /home/user/Python/build/hello/warn-hello.txt
3717 INFO: Graph cross-reference written to /home/user/Python/build/hello/xref-hello.html
3722 INFO: checking PYZ
3725 INFO: Building because toc changed
3725 INFO: Building PYZ (ZlibArchive) /home/user/Python/build/hello/PYZ-00.pyz
4053 INFO: Building PYZ (ZlibArchive) /home/user/Python/build/hello/PYZ-00.pyz completed successfully.
4059 INFO: checking PKG
4064 INFO: Building because toc changed
4064 INFO: Building PKG (CArchive) PKG-00.pkg
6474 INFO: Building PKG (CArchive) PKG-00.pkg completed successfully.
6476 INFO: Bootloader /home/user/.local/lib/python3.6/site-packages/PyInstaller/bootloader/Linux-64bit/run
6477 INFO: checking EXE
6479 INFO: Rebuilding EXE-00.toc because hello missing
6480 INFO: Building EXE from EXE-00.toc
6481 INFO: Appending archive to ELF section in EXE /home/user/Python/dist/hello
6516 INFO: Building EXE from EXE-00.toc completed successfully.
user@linux:~/Python$ 

New ELF format

user@linux:~/Python/dist$ ./hello 
Hello, World!
user@linux:~/Python/dist$ 

user@linux:~/Python$ ls -lh hello.py   
-rw-rw-r-- 1 user user 23 Dis  27 21:43 hello.py
user@linux:~/Python$ 

user@linux:~/Python/dist$ ls -lh hello 
-rwxr-xr-x 1 user user 5.3M Dis  27 21:48 hello
user@linux:~/Python/dist$ 

As you can see, the original code is only 23 bytes, while the ELF way much bigger ... 5.3M !!!

Let's look at another example with C.

user@linux:~/C$ cat hello.c  
#include<stdio.h>

int main()
{
    printf("Hello C World\n");
}
user@linux:~/C$ 

user@linux:~/C$ gcc hello.c -o helloC
user@linux:~/C$ 

user@linux:~/C$ ls -l helloC
-rwxrwxr-x 1 user user 8304 Dis  27 21:53 helloC
user@linux:~/C$ 

user@linux:~/C$ ./helloC
Hello C World
user@linux:~/C$ 

user@linux:~/C$ ls -l hello.c
-rw-rw-r-- 1 user user 65 Dis  27 21:52 hello.c
user@linux:~/C$ 

user@linux:~/C$ ls -lh helloC
-rwxrwxr-x 1 user user 8.2K Dis  27 21:53 helloC
user@linux:~/C$ 

Comparison

Python code size = 27 bytes
Python ELF size = 5.3M

C code size = 65 bytes
C ELF size = 8.2K

Is there a way to make the size smaller?

  • 2
    Because the Python "executable" is including the entire Python interpreter and stdlib. The C example is just, basically, many ones and zeros. – DeepSpace Dec 27 '18 at 14:10
  • 1
    "Source code size" means nothing at all. – Jongware Dec 27 '18 at 14:11
  • 2
    @usr2564301: Well, the final executable is larger by whatever the 27 bytes compresses down to when zipped. :-) But yeah, unless your program is huge, the scripts themselves are likely to pale in comparison to the fixed overhead that is "The entire Python interpreter". – ShadowRanger Dec 27 '18 at 14:12
  • @ShadowRanger: also, my experience with C is that it's quite the opposite. I have seen multiple megabytes of source code being compiled to quite smallish executables. – Jongware Dec 27 '18 at 14:24
  • 1
    There's loads of "semi-duplicates" for this question: https://stackoverflow.com/questions/18548447/what-are-some-general-tips-to-reduce-the-file-size-for-a-pyinstaller-generated-e, https://stackoverflow.com/questions/44681356/reduce-pyinstaller-executable-size, https://stackoverflow.com/questions/19249624/pyinstaller-very-big-file-size (and many more) – DavidW Dec 27 '18 at 14:25
  • Your comparison is already flawed to begin with; both executables are dynamically linking `libc`, which provides a huge amount of support code (particularly for I/O). Linked in statically, that support code would add ~860 KB to your executable size in both cases to make them truly standalone (aside from the OS to back system calls), which would leave the two programs within an order of magnitude in size (~874KB for C, ~6.1MB for PyInstaller). PyInstaller output still being bigger is reasonable (the whole Python interpreter can do a ton of stuff, even if you're not using those features). – ShadowRanger Dec 27 '18 at 14:50

2 Answers2

5

Because Python does NOT compile to machine code.

The ELF created by PyInstaller is as simple as your code packed up with all necessary Python runtime files. It's not in any way comparable to a compiled binary from C, which contains machine code and dynamically linked libraries (libc.so for example).

iBug
  • 35,554
  • 7
  • 89
  • 134
  • 2
    +1 but one could argue that "modern" Python "compiles" to bytecode (ie [pyc](https://stackoverflow.com/questions/2998215/if-python-is-interpreted-what-are-pyc-files)). Perhaps the correct terminology would be "Python does not compile to **machine code**, hence the Python "executable" needs to bring the entire interpreter with it unlike the C example which is basically just ones and zeros" – DeepSpace Dec 27 '18 at 14:15
  • @DeepSpace Everything in computer is ones and zeros. – iBug Dec 30 '18 at 02:33
1

PyInstaller, py2exe and pretty much any other project "converting" Python files to executables isn't really converting anything - it's just packing the full Python interpreter - 4.4 MB alone on my machine -, your project and all the dependencies required by it (all compiled to bytecode, which the interpreter runs) into a single self-extracting executable, so it's normal that it'll be at least as big as a (compressed) Python installation.

Pretty much anything besides the Python interpreter itself and big native dependencies (think numpy, scipy, PyQt) count next to nothing in final executable size. You may have a 10KLOC Python project and, as long as you don't pull in any other external dependency, you'll find out that the final executable size won't be significantly affected.

gcc compiling a C file instead is creating an actual executable, containing the imports and the machine code necessary just to invoke printf; it's 15 bytes of literal string, a handful of bytes to setup a stack frame and actually invoke printf, and all the rest is ELF headers, import tables and various linker junk (even just doing strip -s on it shaves off 2 KB).

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299