231

Could you tell me how can I read a file that is inside my Python package?

My situation

A package that I load has a number of templates (text files used as strings) that I want to load from within the program. But how do I specify the path to such file?

Imagine I want to read a file from:

package\templates\temp_file

Some kind of path manipulation? Package base path tracking?

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
ronszon
  • 2,997
  • 4
  • 19
  • 16
  • Related: [MANIFEST.in ignored on “python setup.py install” - no data files installed?](https://stackoverflow.com/q/3596979/674039) – wim Oct 15 '20 at 15:35

6 Answers6

274

TLDR; Use standard-library's importlib.resources module

If you don't care for backward compatibility < Python 3.9 (explained in detailed in method no 2, below) use this:

from importlib import resources as impresources
from . import templates

inp_file = (impresources.files(templates) / 'temp_file')
with inp_file.open("rt") as f:
    template = f.read()

Details

The traditional pkg_resources from setuptools is not recommended anymore because the new method:

  • it is significantly more performant;
  • is is safer since the use of packages (instead of path-stings) raises compile-time errors;
  • it is more intuitive because you don't have to "join" paths;
  • relies on Python's standard-library only (no extra 3rdp dependency setuptools).

I kept the traditional listed first, to explain the differences with the new method when porting existing code (porting also explained here).



Let's assume your templates are located in a folder nested inside your module's package:

  <your-package>
    +--<module-asking-the-file>
    +--templates/
          +--temp_file                         <-- We want this file.

Note 1: For sure, we should NOT fiddle with the __file__ attribute (e.g. code will break when served from a zip).

Note 2: If you are building this package, remember to declare your data files as package_data or data_files in your setup.py.

1) Using pkg_resources from setuptools(slow)

You may use pkg_resources package from setuptools distribution, but that comes with a cost, performance-wise:

import pkg_resources

# Could be any dot-separated package/module name or a "Requirement"
resource_package = __name__
resource_path = '/'.join(('templates', 'temp_file'))  # Do not use os.path.join()
template = pkg_resources.resource_string(resource_package, resource_path)
# or for a file-like stream:
template = pkg_resources.resource_stream(resource_package, resource_path)

Tips:

  • This will read data even if your distribution is zipped, so you may set zip_safe=True in your setup.py, and/or use the long-awaited zipapp packer from python-3.5 to create self-contained distributions.

  • Remember to add setuptools into your run-time requirements (e.g. in install_requires`).

... and notice that according to the Setuptools/pkg_resources docs, you should not use os.path.join:

Basic Resource Access

Note that resource names must be /-separated paths and cannot be absolute (i.e. no leading /) or contain relative names like "..". Do not use os.path routines to manipulate resource paths, as they are not filesystem paths.

2) Python >= 3.7, or using the backported importlib_resources library

Use the standard library's importlib.resources module which is more efficient than setuptools, above:

try:
    from importlib import resources as impresources
except ImportError:
    # Try backported to PY<37 `importlib_resources`.
    import importlib_resources as impresources

from . import templates  # relative-import the *package* containing the templates

try:
    inp_file = (impresources.files(templates) / 'temp_file')
    with inp_file.open("rb") as f:  # or "rt" as text file with universal newlines
        template = f.read()
except AttributeError:
    # Python < PY3.9, fall back to method deprecated in PY3.11.
    template = impresources.read_text(templates, 'temp_file')
    # or for a file-like stream:
    template = impresources.open_text(templates, 'temp_file')

Attention:

Regarding the function read_text(package, resource):

  • The package can be either a string or a module.
  • The resource is NOT a path anymore, but just the filename of the resource to open, within an existing package; it may not contain path separators and it may not have sub-resources (i.e. it cannot be a directory).

For the example asked in the question, we must now:

  • make the <your_package>/templates/ into a proper package, by creating an empty __init__.py file in it,
  • so now we can use a simple (possibly relative) import statement (no more parsing package/module names),
  • and simply ask for resource_name = "temp_file" (no path).

Tips:

  • To access a file inside your current module, set the package argument to __package__, e.g. impresources.read_text(__package__, 'temp_file') (thanks to @ben-mares).
  • Things become interesting when an actual filename is asked with path(), since now context-managers are used for temporarily-created files (read this).
  • Add the backported library, conditionally for older Pythons, with install_requires=[" importlib_resources ; python_version<'3.7'"] (check this if you package your project with setuptools<36.2.1).
  • Remember to remove setuptools library from your runtime-requirements, if you migrated from the traditional method.
  • Remember to customize setup.py or MANIFEST to include any static files.
  • You may also set zip_safe=True in your setup.py.
ankostis
  • 8,579
  • 3
  • 47
  • 61
  • 1
    str.join takes sequence resource_path = '/'.join(('templates', 'temp_file')) – Alex Punnen Nov 18 '16 at 11:18
  • 2
    I keep getting `NotImplementedError: Can't perform this operation for loaders without 'get_data()'` any ideas? – leoschet Jun 29 '18 at 00:58
  • 2
    Note that `importlib.resources` and `pkg_resources` are *not necessarily compatible*. `importlib.resources` works with zipfiles added to `sys.path`, setuptools and `pkg_resources` work with egg files, which are zipfiles stored in a directory that itself is added to `sys.path`. E.g. with `sys.path = [..., '.../foo', '.../bar.zip']`, eggs go in `.../foo`, but packages in `bar.zip` can also be imported. You cant use `pkg_resources` to extract data from packages in `bar.zip`. I haven't checked if setuptools registers the necessary loader for `importlib.resources` to work with eggs. – Martijn Pieters Sep 27 '19 at 12:50
  • 1
    Is additional setup.py configuration required if error `Package has no location` appears? – zygimantus Nov 11 '19 at 12:40
  • No, setup.py configs are needed in older pythons - your library seems to work, and that is why you get that error. – ankostis Nov 12 '19 at 13:04
  • 1
    for non-python files to be available in your package, use a MANIFEST.in file to list those, and use the instructions here to build it: https://python-packaging.readthedocs.io/en/latest/non-code-files.html . then use `filename = pkg_resources.resource_filename('', 'filename.ext')` to get at it – beep_check Feb 18 '20 at 19:09
  • @beep_check No. Using `pkg_resources` is not recommended anymore, as explained in the answer. Additionally, using `MANIFEST.in` is not needed, and use of `data_files` instead is preferred (see Note 2). – ankostis Feb 21 '20 at 13:55
  • @ankostis thank you for your reply. for distributed packages your method may be preferred. however for creating private packages quickly I prefer the older method, as it is more intuitive and faster to develop. – beep_check Feb 22 '20 at 21:45
  • @beep_check no it not. It is much safer to use packages (instead of path-stings) because you get a compile-time errors if you misspell them; also it is more intuitive because you don't have to join paths; finally it is faster to develop since you don't need an extra dependency (`setuptools`) but rely on Python's standard-library. – ankostis Mar 03 '20 at 13:46
  • 10
    In case you want to access a file inside the current module (and not a submodule like `templates` as per the example), then you can set the `package` argument to `__package__`, e.g. `pkg_resources.read_text(__package__, 'temp_file')` – Ben Mares Jun 22 '20 at 07:10
  • How to load text files from an arbitrary package that's installed in site-packages, not the current module, where I have no control over their setup.py etc.? – chrisinmtown Mar 25 '21 at 13:20
  • 1
    Works great if I run script from package root but if I install the package I get `ImportError: cannot import name 'templates' from 'packagename'` at ` from . import templates` :( – Marek Apr 21 '21 at 07:49
  • sorry the problem was missing `include_package_data=True` in the setup! – Marek Apr 21 '21 at 08:04
  • You can also pass a string as the first argument to `open_text`. In @ronszon 's example that would be 'package.templates'. – jocassid Oct 15 '21 at 20:14
211

A packaging prelude:

Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place - it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.

Structure your project like this, putting data files into a subdirectory within the package:

.
├── package
│   ├── __init__.py
│   ├── templates
│   │   └── temp_file
│   ├── mymodule1.py
│   └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py

You should pass include_package_data=True in the setup() call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the templates/temp_file gets packaged for this example project structure, add a line like this into the manifest file:

recursive-include package *

Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you're using pyproject.toml and you don't have a setup.py file then you can ignore all the stuff about MANIFEST.in.

Now, with packaging out of the way, onto the reading part...

Recommendation:

Use standard library pkgutil APIs. It's going to look like this in library code:

# within package/mymodule1.py, for example
import pkgutil

data = pkgutil.get_data(__name__, "templates/temp_file")

It works in zips. It works on Python 2 and Python 3. It doesn't require third-party dependencies. I'm not really aware of any downsides (if you are, then please comment on the answer).

Bad ways to avoid:

Bad way #1: using relative paths from a source file

This was previously described in the accepted answer. At best, it looks something like this:

from pathlib import Path

resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()

What's wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn't work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user's control whether or not your package gets extracted to a filesystem at all.

Bad way #2: using pkg_resources APIs

This is described in the top-voted answer. It looks something like this:

from pkg_resources import resource_string

data = resource_string(__name__, "templates/temp_file")

What's wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using pkg_resources can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That's not a big deal at install time (since installation is once-off), but it's ugly at runtime.

Bad way #3: using legacy importlib.resources APIs

This is currently was previously the recommendation of the top-voted answer. It's in the standard library since Python 3.7. It looks like this:

from importlib.resources import read_binary

data = read_binary("package.templates", "temp_file")

What's wrong with that? Well, unfortunately, the implementation left some things to be desired and it is likely to be was deprecated in Python 3.11. Using importlib.resources.read_binary, importlib.resources.read_text and friends will require you to add an empty file templates/__init__.py so that data files reside within a sub-package rather than in a subdirectory. It will also expose the package/templates subdirectory as an importable package.templates sub-package in its own right. This won't work with many existing packages which are already published using resource subdirectories instead of resource sub-packages, and it's inconvenient to add the __init__.py files everywhere muddying the boundary between data and code.

This approach was deprecated in upstream importlib_resources in 2021, and was deprecated in stdlib from version Python 3.11. bpo-45514 tracked the deprecation and migrating from legacy offers _legacy.py wrappers to aid with transition.

Honorable mention: using the traversable importlib resources API

This had not been mentioned in the top-voted answer when I posted about it (2020), but the author has subsequently edited it into their answer (2023). importlib_resources is more than a simple backport of the Python 3.7+ importlib.resources code. It has traversable APIs for accessing resources with usage similar to pathlib:

import importlib_resources

my_resources = importlib_resources.files("package")
data = my_resources.joinpath("templates", "temp_file").read_bytes()

This works on Python 2 and 3, it works in zips, and it doesn't require spurious __init__.py files to be added in resource subdirectories. The only downside vs pkgutil that I can see is that the traversable APIs are only available in the stdlib importlib.resources from Python-3.9+, so there is still a third-party dependency needed to support older Python versions. If you only need to run on Python-3.9+ then use this approach, or you can add a compatibility layer and a conditional dependency on the backport for older Python versions:

# in your library code:
try:
    from importlib.resources import files
except ImportError:
    from importlib_resources import files

# in your setup.py or similar:
from setuptools import setup
setup(
    ...
    install_requires=[
        'importlib_resources; python_version < "3.9"',
    ]
)

Until Python 3.8 is end-of-life, my recommendation remains with stdlib pkgutil, to avoid the extra complexity of a conditional dependency.

Example project:

I've created an example project on GitHub and uploaded on PyPI, which demonstrates all five approaches discussed above. Try it out with:

$ pip install resources-example
$ resources-example

See https://github.com/wimglenn/resources-example for more info.

wim
  • 338,267
  • 99
  • 616
  • 750
  • The top-voted answer DOES NOT suggest option (2). On the contrary, it explains why (2) is a bad idea from the past, and suggests (3). Actually, option (1) is a wrapper around `importlib.resources` (aka (2))in Python-3. – ankostis Apr 13 '20 at 09:07
  • @ankostis I guess it was edited at a later date, it originally recommended `pkg_resources`. Well, whether it recommend `pkg_resources` or `importlib.resources` doesn't matter, because they are both bad. – wim Apr 13 '20 at 10:47
  • 4
    It has been edited last May. But i guess it's easy to miss the explanations at the intro. Still, you advice people against the standard - that's a hard bullet to bite :-) – ankostis Apr 13 '20 at 18:05
  • @ankostis Yes. And I don't make such advice lightly. This "standard" was an incomplete API from the start (see [issue58](https://gitlab.com/python-devs/importlib_resources/issues/58) about that), and a testimony of that stillborn design: it's already under deprecation. See [*Deprecate legacy API in favor of traversable API (files)*](https://gitlab.com/python-devs/importlib_resources/issues/80). If the improved API makes it's way into stdlib at a later date I may change my recommendation and edit this answer. Until that time, I see no benefit over the `pkgutil` APIs, only significant downsides. – wim Apr 13 '20 at 18:19
  • The significant downsides are just that it "expose the package/templates subdirectory as an importable package.templates sub-package"? – ankostis Apr 17 '20 at 06:21
  • No, the significant downside is that it does not work at all for many existing packages. Most have not put `__init__.py` files into their data directories, e.g. `pytz` does not for the zone info files. And those packages are _already released_, the authors can't go back and rewrite history, and frankly there is not really a good reason for littering the data directories with `__init__.py` files anyway - that's just a shortcoming of importlib resources APIs. – wim Apr 17 '20 at 08:37
  • No solution works for "existing packages", without, at least, some kind of modification to the project sources. And the instructions of `importlib.resources` explicitly demand this "step", to add an `__init__.py` *if not already there*. All too often, the data-file is located within the folder of the code using it, so this steps is not even necessary (and hence no expose-subdir issue either). This cannot be the "downside", really. What am i missing here? Could it be that you are solving a *different problem*? This question is for data-files **inside my Python package** (not other ones). – ankostis Apr 18 '20 at 15:40
  • @ankostis What do you mean no solution works for existing packages? That's just mistaken. Stdlib pkgutil works with existing packages. Consider `pytz` that I mentioned earlier: resources are accessed e.g. with `pkgutil.get_data(package="pytz", resource="zoneinfo/Singapore")`. Can you show me how to access the same resource using `importlib.resources`? You can not add `__init__.py` to `pytz==2019.3` because the release is already [published on PyPI](https://pypi.org/project/pytz/2019.3/#files) and the release files are immutable on index (checksum for same version number may not be modified). – wim Apr 18 '20 at 18:46
  • 3
    @ankostis Let me turn the question on you instead, why would you recommend `importlib.resources` despite all these shortcomings with an incomplete API that's already [pending deprecation](https://gitlab.com/python-devs/importlib_resources/issues/80)? Newer is not necessarily better. Tell me **what advantages does it actually offer** over the stdlib pkgutil, which your answer does not make any mention about? – wim Apr 18 '20 at 18:55
  • 1
    This works well to get a file content but I need the filename or a File like object. I'm trying to do that: `logging.config.fileConfig(filename)`. – Adrien H Apr 20 '20 at 08:17
  • Interesting. Thank you @wim for the resource. I see the [Brett has his stakes](https://gitlab.com/python-devs/importlib_resources/issues/58#note_274264281) at the new traversal API, which is incompatible both with `pkgutil` & `importlib.resource`, located also on the `importlib` package. I noticed that nobody in that thread responded to your suggestion to use `pkgutil`. Just because of that, i wouldn't recommend it's use, and stay with the current "standard". Maybe because [`pkgutil.get_data()` doesn't always work](https://docs.python.org/3.9/library/pkgutil.html#pkgutil.get_data)? – ankostis Apr 20 '20 at 14:46
  • @ankostis importlib resources is not working with namespace packages either (see *[Does importlib_resources work for namespace packages?](https://gitlab.com/python-devs/importlib_resources/-/issues/20)* and *[Unable to retrieve resources from a namespace package](https://gitlab.com/python-devs/importlib_resources/-/issues/68)* about that), so using this as a reason to prefer importlib resources does not make a lot of sense. I'm still waiting to hear what advantages importlib.resources offers? – wim Apr 20 '20 at 15:56
  • Note that these modules are *both* standard lib, and pkgutil is much more widely used (if only because it's older), so I'm not sure on what grounds you're calling importlib the "current standard"? The opposite is true - it's the newcomer here. – wim Apr 20 '20 at 16:01
  • 1
    Dear @wim, [Brett Canon's last response](https://gitlab.com/python-devs/importlib_resources/-/issues/58#note_329352693) on the use of `pkgutil.get_data()` confirmed my gut feeling - it's an underdeveloped, to-be-deprecated API. That said, i agree with you, `importlib.resources` is not a much better alternative, but until PY3.10 resolves this, i stand by this choice, heving learned that it is not just another "standard" recommended by the docs. – ankostis Apr 22 '20 at 19:20
  • 3
    @ankostis I would take Brett's comments with a grain of salt. `pkgutil` is not mentioned at all on the deprecation schedule of *[PEP 594 -- Removing dead batteries from the standard library](https://www.python.org/dev/peps/pep-0594/)*, and is unlikely to be removed without a good reason. It has been around since Python 2.3 and specified as part of the loader protocol in [PEP 302](https://www.python.org/dev/peps/pep-0302/#optional-extensions-to-the-importer-protocol). Using an "under-defined API" is not a very convincing reply, that could describe the majority of the Python standard library! – wim Apr 22 '20 at 20:04
  • 4
    Let me add: *I want to see importlib resources succeed, too!* I'm all for rigorously defined APIs. It's just that in its current state, it can not really be recommended. The API is still undergoing change, it's unusable for many existing packages, and only available in relatively recent Python releases. In practice it's worse than `pkgutil` in just about every way. Your "gut feeling" and [appeal to authority](https://www.logicallyfallacious.com/logicalfallacies/Appeal-to-Authority) is meaningless to me, if there are problems with `get_data` loaders then show evidence and practical examples. – wim Apr 22 '20 at 20:27
  • @wim When invoking a python script, `__name__` is `__main__`. If `mymodule1.py` is run as the main script, `__name__` won't evaluate to the package name. Furthermore `pkgutil.get_data()` is a wrapper for `importlib.abc.ResourceLoader.get_data()`, which is the same as what `importlib.resources` methods `read_binary()` and `read_text()` do. Furthermore, these functions provide context managers, which pkgutil doesn't. If the resource is in a zip file, a temporary file is created and cleaned up. `pkg_util` does not do this. – adam.hendry Aug 19 '20 at 20:07
  • @A.Hendry 1. Running modules within a package as scripts is considered an [anti-pattern](https://mail.python.org/pipermail/python-3000/2007-April/006793.html) in the first place. When working within a package with resources, instead of `python mypackage/mymodule1.py` it would be correct to do `python -m mypackage.mymodule1`. 2. That `importlib.resources` and `pkgutil` both use other parts of importlib machinery is true, but so what? I'm not sure what point you're actually trying to make there? – wim Aug 19 '20 at 21:01
  • 3. pkgutil is using stdlib `zipimport` which will open the resource with a context manager, it does not leave around any temporary file around that needs to be cleaned up. You're right that it doesn't provide a context manager interface, so if users needs a long-lived resource, e.g. a huge file with ability to seek within rather than one-shot get data, then that's indeed one area where pkgutil doesn't help. – wim Aug 19 '20 at 21:07
  • @wim 1. I just meant what happens when resources are loaded from a main program? This happens, e.g., with GUIs (loading images, credentials, templates, etc.). 2. Fair enough. Not a point, just stating they do about the same thing. – adam.hendry Aug 19 '20 at 23:00
  • @wim Also, I believe "The `-m` option searches `sys.path` for the module name and runs its content as `__main__`", so `__name__` still equals `"__main__"` in this instance (https://realpython.com/run-python-scripts/#how-to-run-python-scripts-using-the-command-line). – adam.hendry Aug 19 '20 at 23:07
  • @wim Thirdly, I don't believe running scripts in a package is an "anti-pattern". The link you provided is about a PEP to change `if __name__ == "__main__"` to `if __name__ == sys.main`, which is not the same as what I'm talking about. I'm talking about running the main program at the top of the package directory structure, which is very common. – adam.hendry Aug 19 '20 at 23:13
  • @wim Finally, I think a decided win for either package would be to use `timeit` and see which is fastest. – adam.hendry Aug 19 '20 at 23:18
  • @A.Hendry Yes `__name__` will still equal `"__main__"` in that case, but the difference is the module would be loaded with the `__spec__` set on it ([docs](https://docs.python.org/3/reference/import.html#main-spec)). That is the important difference in this case, because that's how `importlib.util` will resolve resources. If you don't like to use `__name__` for some reason, I suppose you can just hardcode the package name there instead. – wim Aug 20 '20 at 00:01
  • 1
    As for `timeit`, here are the result: https://gist.github.com/wimglenn/c3565a1c2d09bf2670b8c78088b2f02e pkgutil is almost twice as fast, at least for these small resources. Note that I have commented out the print calls from source code before timing, so that it's only timing the loaders. – wim Aug 20 '20 at 00:12
  • @wim These are all strong arguments in favor of keeping and maintaining `pkgutil` and putting `importlib.resources` to bed. 1. Faster (nearly 2x as so), 2. No need to add `__init__.py` everywhere (more pythonic, as it avoids the confusion of making resources packages (because they're not packages) and no need to update existing published libraries). I don't want to see `pkgutil` go away if it already works. What about it is "underdeveloped" as Brett stated? – adam.hendry Aug 20 '20 at 02:34
  • 1
    @wim You've convinced me. I just posted in favor of `pkgutil` here https://gitlab.com/python-devs/importlib_resources/-/issues/58. Hoping to get some more eyeballs and +1's on this. If you have suggestions, let me know. – adam.hendry Aug 20 '20 at 02:54
  • I don't think the speed is a big deal (they are both fast enough). I do think the `__init__.py` everywhere is a big deal. But the good news is they're planning to change that `__init__` limitation going forward. Not really sure what Brett means by under-defined api? It seems a meaningless complaint. Also don't understand this push to move things towards the new code - we've seen several problems that has caused (there was [this](https://bugs.python.org/issue40924) one recently, also RHEL hit some problem for `ensurepip`) and I've yet to learn any significant benefits it brings to the table. – wim Aug 20 '20 at 05:01
  • @wim How will the `__init__.py` limitation be changed? Also, is the push to new code in order to severe the connection to Python 2? – adam.hendry Aug 20 '20 at 16:29
  • @wim One of the arguments for converting from `pkg_resources` to `importlib` was overhead and speed. There may not be a huge speed difference between `importlib` and `pkgutil`, but I'm playing devil's advocate for naysayers who still want to move forward using `importlib.resources`. I too am against blind appeals to authority, and as it stand, I see only disadvantages to `importlib.resources` over `pkgutil`. – adam.hendry Aug 22 '20 at 21:13
  • @wim One last note: can we change `__name__` to `__package__` in the above? That should work in every instance, yes? – adam.hendry Aug 22 '20 at 21:32
  • @wim Actually, I'm realizing now that defeats the purpose, since we cannot use a package and would have to add `__init__.py` to make it into a package...So it seems `pkgutil` has the exact same limitation as `importlib`. Your solution doesn't work if the directory you are trying to access isn't a package. – adam.hendry Aug 22 '20 at 23:29
  • @A.Hendry Changing `__name__` to `__package__` does not offer any advantage that I can think of. When executing as a script (`python mypackage/mymodule1.py`) the `__name__` will be `"__main__"` and the `__package__` will be `None`. So, no, it will not work in that instance. And `pkgutil` does not have the `__init__.py` limitation that `importlib.resources` currently has, with `pkgutil` the resources can be located in plain old subdirectories (review https://github.com/wimglenn/resources-example to convince yourself). – wim Aug 23 '20 at 01:09
  • 1
    @ankostis I've added another option using `importlib_resources.files` which is not quite battle-tested as the old pkgutil workhorse, but looks promising. – wim Aug 26 '20 at 17:57
  • @wim just to confirm that Python-9 standard-lib `import.resources` indeed will support accessing files beyond packages, and link to [the corresponding issue](https://gitlab.com/python-devs/importlib_resources/-/issues/58#note_402625336). – ankostis Sep 04 '20 at 08:46
  • @ankostis I know. The author of the comment you’re linking to is also me! :) – wim Sep 04 '20 at 13:48
  • 1
    @wim Is the one downside to `pkgutil.get_data` that it won't work if you call the file as a script such that `__name__` is `"__main__"`? I noticed that `pkg_resources.resource_filename()` finds the file correctly even when the file is run as a script, but `pkgutil` gives me an error `ValueError: __main__.__spec__ is None`. – Nathaniel Ruiz Jun 28 '21 at 19:37
  • 1
    @NathanielRuiz You're right - I was able to reproduce that with [resources-example](https://github.com/wimglenn/resources-example/tree/master/myapp), the `pkgutil` code (example2.py) was unable to find resources when executed directly as a script. The other 4 approaches still worked. Note that executing submodules of a package as scripts directly [has been called an antipattern by Guido](https://mail.python.org/pipermail/python-3000/2007-April/006793.html), so I'm not really sure if it *should* work in this case.. :) – wim Jun 28 '21 at 20:22
  • 1
    I feel this should be the accepted answer instead - requiring a 3rd party library which doesn't support some old version of Python 3 and Python 2 is simply excluded from being supported doesn't sound very convenient, to say at least. – ibic Dec 04 '21 at 03:16
  • Something else I liked in this solution is that it works both for the packaged version and when code is executed directly during development. – Diomidis Spinellis Dec 27 '22 at 09:15
  • What if I need to read in a csv file or an h5 file rather than a binary file? My usual approach is to get the path to the file and then use that to open the file and read its contents with an appropriate package. – Jagerber48 Apr 23 '23 at 15:40
  • @Jagerber48 for a CSV you would call a chain like `csv.reader(io.TextIOWrapper(BytesIO(pkgutil.get_data(...))))` see the individual methods for how to fill in the arguments. – bad_coder Apr 23 '23 at 16:06
20

The content in "10.8. Reading Datafiles Within a Package" of Python Cookbook, Third Edition by David Beazley and Brian K. Jones giving the answers.

I'll just get it to here:

Suppose you have a package with files organized as follows:

mypackage/
    __init__.py
    somedata.dat
    spam.py

Now suppose the file spam.py wants to read the contents of the file somedata.dat. To do it, use the following code:

import pkgutil
data = pkgutil.get_data(__package__, 'somedata.dat')

The resulting variable data will be a byte string containing the raw contents of the file.

The first argument to get_data() is a string containing the package name. You can either supply it directly or use a special variable, such as __package__. The second argument is the relative name of the file within the package. If necessary, you can navigate into different directories using standard Unix filename conventions as long as the final directory is still located within the package.

In this way, the package can installed as directory, .zip or .egg.

chaokunyang
  • 2,177
  • 1
  • 12
  • 12
17

In case you have this structure

lidtk
├── bin
│   └── lidtk
├── lidtk
│   ├── analysis
│   │   ├── char_distribution.py
│   │   └── create_cm.py
│   ├── classifiers
│   │   ├── char_dist_metric_train_test.py
│   │   ├── char_features.py
│   │   ├── cld2
│   │   │   ├── cld2_preds.txt
│   │   │   └── cld2wili.py
│   │   ├── get_cld2.py
│   │   ├── text_cat
│   │   │   ├── __init__.py
│   │   │   ├── README.md   <---------- say you want to get this
│   │   │   └── textcat_ngram.py
│   │   └── tfidf_features.py
│   ├── data
│   │   ├── __init__.py
│   │   ├── create_ml_dataset.py
│   │   ├── download_documents.py
│   │   ├── language_utils.py
│   │   ├── pickle_to_txt.py
│   │   └── wili.py
│   ├── __init__.py
│   ├── get_predictions.py
│   ├── languages.csv
│   └── utils.py
├── README.md
├── setup.cfg
└── setup.py

you need this code:

import pkg_resources

# __name__ in case you're within the package
# - otherwise it would be 'lidtk' in this example as it is the package name
path = 'classifiers/text_cat/README.md'  # always use slash
filepath = pkg_resources.resource_filename(__name__, path)

The strange "always use slash" part comes from setuptools APIs

Also notice that if you use paths, you must use a forward slash (/) as the path separator, even if you are on Windows. Setuptools automatically converts slashes to appropriate platform-specific separators at build time

In case you wonder where the documentation is:

wim
  • 338,267
  • 99
  • 616
  • 750
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 1
    `pkg_resources` has overhead that `pkgutil` overcomes. Also, if provided code is run as entry point, `__name__` will evaluate to `__main__`, not the package name. – adam.hendry Aug 22 '20 at 21:21
  • The upside that I see on his solution is that it is the only one that works with the filename, not with the content directly. Which I can't find another solution to it – titusfx Oct 12 '22 at 11:01
0

The accepted answer should be to use importlib.resources. pkgutil.get_data also requires the argument package be a non-namespace package (see pkgutil docs). Hence, the directory containing the resource must have an __init__.py file, making it have the exact same limitations as importlib.resources. If the overhead issue of pkg_resources is not a concern, this is also an acceptable alternative.

Pre-Python-3.3, all packages were required to have an __init__.py. Post-Python-3.3, a folder doesn't need an __init__.py to be a package. This is called a namespace package. Unfortunately, pkgutil does not work with namespace packages (see pkgutil docs).

For example, with the package structure:

+-- foo/
|   +-- __init__.py
|   +-- bar/
|   |   +-- hi.txt

where hi.txt just has Hi!, you get the following

>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
None

However, with an __init__.py in bar, you get

>>> import pkgutil
>>> rsrc = pkgutil.get_data("foo.bar", "hi.txt")
>>> print(rsrc)
b'Hi!'
adam.hendry
  • 4,458
  • 5
  • 24
  • 51
  • 2
    This answer is incorrect - the directory containing the resources does not need to be a package. It can be a subdirectory _within_ a package. The limitation of `importlib.resources`, which `pkgutil` does not have, was that the directory containing resources itself needs to have an `__init__.py` too, i.e. it has to be a _subpackage_. That's unrelated to namespace package issues, which concern whether there's an `__init__.py` at the top-level directory rather than in data subdirectories within the package. – wim Aug 26 '20 at 17:28
  • @wim I'm sorry, but I believe you are mistaken. `pre-Python 3.3+`, all packages were required to have an `__init__.py` to be loaded. Post-3.3, packages don't need them. Packages without `__init__.py` are `namespace packages`. Per the `pkgutil` docs, if you try to load a resource from a namespace package, you will get `None`. Please see my updated edited answer. – adam.hendry Aug 27 '20 at 04:11
  • 2
    You were using `pkgutil` incorrectly. Try with `pkgutil.get_data("foo", "bar/hi.txt")` – wim Aug 27 '20 at 16:28
-3

assuming you are using an egg file; not extracted:

I "solved" this in a recent project, by using a postinstall script, that extracts my templates from the egg (zip file) to the proper directory in the filesystem. It was the quickest, most reliable solution I found, since working with __path__[0] can go wrong sometimes (i don't recall the name, but i cam across at least one library, that added something in front of that list!).

Also egg files are usually extracted on the fly to a temporary location called the "egg cache". You can change that location using an environment variable, either before starting your script or even later, eg.

os.environ['PYTHON_EGG_CACHE'] = path

However there is pkg_resources that might do the job properly.

Florian
  • 2,562
  • 5
  • 25
  • 35