4

In my local version of a package I am trying to distribute, I have the following code:

shutil.copytree(WWW_LOCATION, dir_path)

WWW_LOCATION is a subfolder of my python module containing some static files and folders:

dv
  \dv
     mytool.py
     \www_folder
       \somefolders_and_files
  setup.py
  MANIFEST.in
  README.md
  LICENSE
  setup.cfg

In my code, after execution, I need to copy this whole folder to a user-specified location together with some files generated on the fly. This works great locally, but I read that for distribution via pypi, I have to take care since the files might get zipped.

This answer explains how to access stuff in the resources (=read them in in python), however, only a single file at a time. What is a secure way to instead copy the folder contents to the specified location?

Thomas
  • 4,696
  • 5
  • 36
  • 71

1 Answers1

6

The 'easier' solution is to set zip_safe=False in your package setup (setup.py or setup.cfg), to avoid your package being installed as a zipped egg. Since most installations are done with pip, which never creates zipped egg installations, running into a zipped package installation is more rare these days anyway.

You then only have to worry about someone zipping up packages manually into a zipfile to add to sys.path, a use case you could choose to not support. That's a different form of zipped packages; eggs are zipfiles for one installable project (supported by pkg_resources) and storing them in a directory that's listed on sys.path. pkg_resources can only support the latter, not the former.

If you do want to support a zipped egg, then for your specific case, it'll be easier to use the pkg_resources API for resource extraction, because while it may be 'slower', it also supports full directory trees. From the resource_filename() documentation:

If the named resource is a directory, then all resources within that directory (including subdirectories) are also extracted.

I'd use it like this:

try:
    www_location = pkg_resources.resource_filename("dv", "www_folder")
    shutil.copytree(www_location, dir_path)
finally:
    pkg_resources.cleanup_resources()

Resources inside packages found in a zipfile directly added to sys.path can't be accessed via pkg_resources. For that you need the newer importlib.resources module (or it's backport), but this API doesn't support arbitrary directory structures. The importlib.resources.path() function documenation states:

package is either a name or a module object which conforms to the Package requirements. resource is the name of the resource to open within package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).

(bold emphasis mine).

While you can find directories within a package by using importlib.resources.contents(), you can't actually access the contents of those directories unless they are themselves Python packages (so have a __init__.py file in them). The implementation of those functions for traditional, non-zipped packages will still give you access to the directory when using importlib.resources.path(), you can't do the same when the package is contained in a .zip archive.

importlib.resources is a better, future proof path to support. To support that you could zip up the www_folder resource tree in your source and wheel distributions, and then use with importlib.resources.path("dv", "www_folder.zip") as www_location: www_zip = zipfile.open(www_location) and extract the contents from that zipfile object to the destination.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thank you for the extensive post - I have decided to go with `zip_safe=False` since in this case, the resource files are a mere 60kb and compressing them is really not necessary. – Thomas Sep 27 '19 at 12:40