2

I'm planning to create a huge executable directory and install it on some devices.

Imagine, that lateron I discover a bug in one of my python modules. Is there any way to transfer/copy only the modified byte code and replace the original byte code with the new one.

The reason I want to do this is, that in my context bandwidth is very expensive and I'd like to patch the code remotely.

Example: I have a project with two files: prog.py: (with following three lines)

import mod1
if __name__ == "__main__":
    mod1.hello()

mod1.py: (with following two line)

def hello():
    print("hello old world")

Now I use PYTHONHASHSEED=2 pyinstaller prog.py to create my directory which I copy to my device

Now I modify mod1.py:

def hello():
    print("hello new world")

and I recompile with PYTHONHASHSEED=2 pyinstaller prog.py The full directory has (tared and gzipped) a size of about 10M The file dist/prog/prog has a size of about 1M

with pyi-archive_viewer I can extract PYZ-00.pyz out of my executable dist/prog/prog In PYZ-00.pyz I can find and extract mod1 which uses only 133 bytes.

Now if I copy that file to my device, how could I update the old dist/prog/prog such, that it has the new PYZ-00.pyz:mod1 byte code.

What code could I use to decompose, what code could I use to reassemble after having replaced one specific file (module)?

Alternative: Move pyc files to a zip file Startup performance is not that crucial. I could also live with an alternative solution, where no PYZ file is created and added to the executable, but where the dist directory contains a zip file with all the .pyc files

Another alternative: copy .pyc files into application directory This would result in __file__ having exactly the same value as in the PYZ mode. Performance wise probably not that nice and creating a lot of files, but if incremental updates are crucial perhaps one option to handle it.

gelonida
  • 5,327
  • 2
  • 23
  • 41

2 Answers2

1

This is quite a complex problem, but I think this may be at least part of what you're looking for.

Based on your example, I changed prog.py so it imports pretty normally when running from source, but when frozen with pyinstaller runs from pyc files directly.

import sys

def import_pyc(name):
    import py_compile
    import types
    import marshal
    
    pyversion = f"{sys.version_info.major}{sys.version_info.minor}"
    filename = f"{name}.cpython-{pyversion}.pyc"
    
    with open(filename, "rb") as pyc_file:
        # pyc files have 16 bytes reserved at the start in python 3.7+
        # due to https://www.python.org/dev/peps/pep-0552/
        # may change again in the future
        pyc_file.seek(16) 
        code_obj = marshal.load(pyc_file)

    module = types.ModuleType(name)
    exec(code_obj, module.__dict__)

    globals()[name] = module

def import_py(name):
    import importlib
    
    globals()[name] = importlib.import_module("mod1")
    
def import2(name):
    if getattr(sys, "frozen", False):
        import_pyc(name)
    else:
        import_py(name)


import2("mod1")

if __name__ == "__main__":
    mod1.hello()

This is heavily based on the wonderful answer here.

This means that mod.py is not packaged by PyInstaller, you will have to include mod1.cpython-38.pyc as a data file.

One convenient way to do this is with the command PyInstaller --add-data "__pycache__/*;." prog.py (Although switch the semicolon for a colon if you're not on Windows). This puts everything in the __pycache__ folder, all your imported modules, into your ending dist/prog folder. Note that if you run this multiple times PyInstaller puts a pyc for the main python folder in __pycache__, so that will get bundled on subsequent runs.

Depending on how you bundle and run your project, you will probably run into problems where the current working directory is off, which will result in a FileNotFound when you try to load the pycs. I can't give you a silver bullet to find the path you want, since it depends on how you end up doing things, but ways I commonly use to find the absolute path that should be the current working directory are os.path.dirname(sys.executable) and os.path.dirname(os.path.abspath(__file__)).

Starbuck5
  • 1,649
  • 2
  • 7
  • 13
  • yes this is a rather partial solution and not easy to implement for an existing code base with hundreds of files, as every file had to be modified. It also doesn't show how to move the .pyc files out of the .PYZ file and into the directory. I will post an answer which I'm not really happy about, but which seems to provide a working solution. though it patches neither the .PYZ file nor uses a zip file. – gelonida Aug 12 '21 at 09:19
  • With this, the pyc files never end up in the PYZ archive in the first place, since PyInstaller doesn't know about the imports. Although I'm not sure how this would work with more complicated import mechanics, like modules, which is probably relevant for that many files. – Starbuck5 Aug 12 '21 at 09:50
  • Yep I see. I have an existing project with several hundreds of files and many third party dependencies, so this approach is not really viable. I depend on `Pyinstaller`'s analysis phase to know what has to be packaged. My alternative solution, that I just came up with yesterday is far from nice, but requires no changes in the source code (or only one change in the top level program if I'd like to use zip files) or if I'd like to put all code into a dedicated sub directory – gelonida Aug 12 '21 at 10:00
  • Now a Stackoverflow specific question. Id' like to accept my answer as it will probably be more helpful for the ones going through this question in the future. However I'd like to attribute the bounty to you as yours is the only answer and I'd like to reward your effort. What are the SO mechanics to achieve this? – gelonida Aug 12 '21 at 10:07
  • Ah, I missed this earlier. I'm not sure really - I've never done a bounty before. I appreciate the thought, but if it doesn't easily work don't worry about it. – Starbuck5 Aug 12 '21 at 10:47
  • Found on: https://meta.stackexchange.com/questions/16065/how-does-the-bounty-system-work *As of June 2010, the bounty system is decoupled from accepting an answer.* So I awarded you the bounty and will for the moment accept my answer – gelonida Aug 12 '21 at 11:24
  • Just out of curiosity. Is my own answer understandable? – gelonida Aug 12 '21 at 11:26
0

This solution is neither capable of 'patching' a .PYZ file nor capable of putting all .pyc files into a zip file.

But so far it is the only viable solution I found so far, that works for huge projects with loads of third party dependencies.

The idea is to remove all (or most files from the .PYZ file) and copy the corresponding .pyc files into the working directory.

I will enhance and elaborate this answer over time. I'm still experimenting:

I achieve this by modyfing the spec file:

  • determine the directory MYDIR where the spec file is located in
  • create a directory, MYDIR/src where all the files from a.pure shall be copied to
  • copy all files from a.pure to to MYDIR/src. (with subdirectories corresponding to the module's name. Module mypackage.mod.common would for example be stored in MYDIR/src/mypackage/mod/common.py)
  • iterate through files and compile them to a .pyc file and remove .py file afterwards.
  • create a PYZ file which contains only the files that are not copied. (in my test case, keep no .pyc file in the PYZ)
  • create exe with the modified PYZ
  • collect all files that should be collected plus also all files from MYDIR/src (e.g. with a.datas + Tree("src")

Spec file Changes: At the beginning

import os
MYDIR = os.path.realpath(SPECPATH)
sys.path.append(MYDIR)
import mypyinsthelpers  # allows to reuse the code in multiple projects

Then after the (unmodified) a = Analysis(... section I add.

to_rmv_from_pyc = mypyinsthelpers.mk_copy_n_compile(a.pure, MYDIR)

# modified creation of pyz`
pyz = PYZ(a.pure - to_rmv_from_pyc, a.zipped_data,
             cipher=block_cipher)

I will detail the function mypyinsthelpers.mk_copy_n_compile further down

Change the collect phase:

Instead of

coll = COLLECT(exe,
               a.binaries,
               a.zipfiles,
               a.datas,
...

I write:

coll = COLLECT(exe,
               a.binaries,
               a.zipfiles,
               a.datas + Tree("src"),
...

And here the declaration of mypyinsthelpers.mk_copy_n_compile()

import compileall
import os
import shutil
from pathlib import Path


def mk_copy_n_compile(toc, src_tree):
    """
    - copy source files to a destination directory
    - compile them as pyc
    - delete source
    """
    dst_base_path = os.path.join(src_tree, "src")
    to_rm = []
    # copy files to destination tree
    for entry in toc:
        modname, src, typ = entry
        assert typ == "PYMODULE"
        assert src.endswith(".py") or src.endswith(".pyw")
        # TODO: might add logic to skip some files (keep them in PYC)
        to_rm.append(entry)

        if src.endswith("__init__.py"):
            modname += ".__init__"

        m_split = modname.split(".")
        m_split[-1] += ".py"
        dst_dir = os.path.join(dst_base_path, *m_split[:-1])
        dst_path = os.path.join(dst_dir, m_split[-1])
        if not os.path.isdir(dst_dir):
            os.makedirs(dst_dir)
        print(entry[:2], dst_path)
        shutil.copy(src, dst_path)

    # now compile all files and rmv src
    top_tree = src_tree
    src_tree = os.path.join(src_tree, "src")
    curdir = os.getcwd()
    os.chdir(dst_base_path)
    for path in Path(dst_base_path).glob("**/*.py"):
        # TODO: might add code to keep some files as source
        compileall.compile_file(
            str(path.relative_to(dst_base_path)), quiet=1, legacy=True)
        path.unlink()
    os.chdir(curdir)
    return to_rm
gelonida
  • 5,327
  • 2
  • 23
  • 41