1

I have a Python application in a directory dir. This directory has a __main__.py file and several data files that are read by the application using open(...,'r'). Without editing the code, it it possible to bundle the code and data files into a single zip file and execute it using something like python app.pyz

  • My goal is to share the file and data easily.
  • Running the application using python dir works fine.
  • If I make a zip file using python -m zipfile -c app.pyz dir/*, the resulting application will run but cannot read the files. This makes sense.
  • I can ask the customers to unzip the compressed folder before running or I could embed the files as strings within the code. That said, I'm curious of this can be avoided.

Can I bundle code and data into one file?

RyeGrain
  • 61
  • 6
  • Does this answer your question? [How can I make a Python script standalone executable to run without ANY dependency?](https://stackoverflow.com/questions/5458048/how-can-i-make-a-python-script-standalone-executable-to-run-without-any-dependen) – Kraigolas Jan 19 '22 at 02:22
  • You can have a **build** script, that will assemble a file containing the data you want into a python file, that can be imported with Python. But if the data is large, I would not recommend it. Instead, you could read the original files directly from inside the zipfile. – Lenormju Jan 19 '22 at 15:11
  • 1
    Thank you both. Both answers are more work than is reasonable in this case. Place this into an answer and I'll mark it as accepted. – RyeGrain Jan 21 '22 at 00:17
  • Did you ever figure this issue out? I'm having the same problem. After I add my program to the pyz file, I can't access my json config because it's not a python file and there's no files system. I think there's a way to do it using importlib.resources, but I haven't found a working example. Each example I find requires having a package to do it, but my script is simple and everything is at the root (hence dunder package is empty) – u84six Aug 31 '22 at 21:34

1 Answers1

3

As of Python 3.9 you can use importlib.resources from the standard library. This module uses Python's import machinery to resolve the paths of data files as though they were modules inside a package.

  • Create a new package inside dir. Let's call it data. Make sure it has an __init__.py.

  • Add your data files to data. Let's say you added a text file text.txt and a binary file binary.dat.

Now from your __main__.py script or any part of your code with access to the module data, you can access files inside that package like so:

  • To read text.txt to memory as a string:
txt_file = importlib.resources.files("data").joinpath("text.txt").read_text(encoding="utf-8")
  • To read binary.dat to memory as bytes:
bin_file = importlib.resources.files("data").joinpath("binary.dat").read_bytes()
  • To open any file:
path = importlib.resources.files("data").joinpath("text.txt")
with path.open("rt", encoding="utf-8") as file:
    lines = file.readlines()
# As streams:
textio_stream = importlib.resources.files("data").joinpath("text.txt").open("rt", encoding="utf-8")
bytesio_stream = importlib.resources.files("data").joinpath("binary.dat").open("rb")
  • If something requires an actual real file on the filesystem, or you simply want to wrap zipapp compatibility over existing code (e.g. with open()) without having to modify it:
# Old, incompatible with zipfiles.
file_path = "data/text.txt"
with open(file_path, "rt", encoding="utf-8") as file:
    lines = file.readlines()
# New, compatible with zipfiles.
file_path = importlib.resources.files("data").joinpath("text.txt")

# If file is inside a zipfile, unzips it in a temporary file, then
# destroys it once the context manager closes. Otherwise, reads the file normally.
with importlib.resources.as_file(file_path) as path:
    with open(path, "rt", encoding="utf-8") as file:
        lines = file.readlines()
# Since it is a context manager, you can even store it like this:
file_path = importlib.resources.files("data").joinpath("text.txt")
real_path = importlib.resources.as_file(file_path)

with real_path as path:
    with open(path, "rt", encoding="utf-8") as file:
        lines = file.readlines()

The Traversable objects returned from importlib.resources functions can be mixed with Path objects using as_posix, since joinpath requires posix separators:

file_path = pathlib.Path("subdirectory", "text.txt")
txt_file = importlib.resources.files("data").joinpath(file_path.as_posix()).read_text(encoding="utf-8")

You can use slashes to grow a Traversable, just like pathlib.Path objects:

resources_root = importlib.resources.files("data")
text_path = resources_root / "text.txt"
bin_file = (resources_root / "subdirectory" / "bin.dat").read_bytes()

You can also import the data package like any other package, and use the module object directly. Subpackages are also supported. The only Python files inside the data tree are the __init__.py files of each subpackage:

# __main__.py
import importlib.resources
import data.config
import data.models.b

# Load binary file `file.dat` from `data.models.b`.
# Subpackages are being used as subdirectories.
bin_file = importlib.resources.files(data.models.b).joinpath("file.dat").read_bytes()
...

You technically only need to make your resource root directory be a package. For max brevity:

# __main__.py
from importlib.resources import files
data = files("data")  # Resources root.

# In this example, `models` and `b` are regular directories:
bin_file = (data / "models" / "b" / "file.dat").read_bytes()
...

Note that importlib.resources and zipfiles in general support reading only and you will get an exception if you try to write to any file-like object returned from the above functions. It might technically be possible to support modifying data files inside zips but this is way out of scope. If you want to write files, just open a file in the filesystem as normal.

Now your data files have become file-system agnostic and your program should work via zipapp and normal invocation just the same.

Dennis
  • 91
  • 1
  • 4