154

I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user's system.

I've tried a variety of methods, but so far I have had no luck. It seems that most of the "current directory" commands return the directory of the system's python interpreter, and not the directory of the module.

This seems like it ought to be a trivial, common problem. Yet I can't seem to figure it out. Part of the problem is that my data files are not .py files, so I can't use import functions and the like.

Any suggestions?

Right now my package directory looks like:

/
__init__.py
module1.py
module2.py
data/   
   data.txt

I am trying to access data.txt from module*.py!

codeforester
  • 39,467
  • 16
  • 112
  • 140
Jacob Lyles
  • 9,920
  • 7
  • 32
  • 30

6 Answers6

195

The standard way to do this is with setuptools packages and pkg_resources.

You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:

http://docs.python.org/distutils/setupscript.html#installing-package-data

You can then re-find and use those files using pkg_resources, as per this link:

http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access

import pkg_resources

DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')
ThorSummoner
  • 16,657
  • 15
  • 135
  • 147
elliot42
  • 3,694
  • 3
  • 26
  • 27
  • I think that this is the preferred way, I'm not entirely sure of the reason but projects show warnings when you refer to the package/module with `__file__`. – lukecampbell Apr 11 '13 at 01:53
  • 7
    Won't *pkg_resources* create a run-time dependency on *setuptools*? For example, I redistribute a Debian package so why would I depend on `python-setuptools` just for that? So far `__file__` works fine for me. – mlt Jul 12 '13 at 16:32
  • 4
    Why this is better: The ResourceManager class provides uniform access to package resources, whether those resources exist as files and directories or are compressed in an archive of some kind – vrdhn Aug 02 '13 at 12:08
  • 4
    Brilliant suggestion, thanks. I implemented a standard file open using `from pkg_resources import resource_filename open(resource_filename('data', 'data.txt'), 'rb')` – eageranalyst Feb 26 '14 at 23:32
  • 6
    How will this work for using the package when it isn't installed? Just testing locally I mean – Claudiu Nov 28 '17 at 19:22
  • @VardhanVarma this isn't better. the setuptools devs are just not on board with fixing this bug as they are in denial about it being a bug. – Matt Joyce Jan 30 '18 at 16:08
  • 21
    In python 3.7, `importlib.resources` replaces `pkg_resources` for this purpose (because of performance problems). – benjimin Mar 14 '19 at 06:21
  • @benjimin also `importlib_resources` for python < 3.7 – pcko1 Nov 02 '19 at 00:10
  • perfect solution. `pkg_resources` also helps when you have multiple resource files that you need to manage (`resource_listdir`) + some more options – mluerig Jan 28 '21 at 11:57
  • This works in Python 3.10 but fails in 3.9 with a `TypeError`. [This solution below](https://stackoverflow.com/a/26278544/1717828) worked in that case. – user1717828 Feb 14 '22 at 15:35
33

There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources that is supposed to replace pkg_resources. It would work for accessing files within packages that do not have slashes in their names, i.e.

foo/
    __init__.py
    module1.py
    module2.py
    data/   
       data.txt
    data2.txt

i.e. you could access data2.txt inside package foo with for example

importlib.resources.open_binary('foo', 'data2.txt')

but it would fail with an exception for

>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
    resource = _normalize_path(resource)
  File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
    raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name

This cannot be fixed except by placing __init__.py in data and then using it as a package:

importlib.resources.open_binary('foo.data', 'data.txt')

The reason for this behaviour is "it is by design"; but the design might change...

  • Do you have a better link for *"it is by design"* than a youtube video — preferably one with text? – gerrit Dec 03 '19 at 15:28
  • @gerrit the 2nd one does contain text. `"This was a deliberate choice, but I think you have a valid use case. @brettcannon what do you think? And if we allow this, should we make sure it gets into Python 3.7?"` – Antti Haapala -- Слава Україні Dec 03 '19 at 15:36
  • 3
    The design has now changed to traversable APIs (avail in stdlib Python 3.9+). Further details in the dupe here -> https://stackoverflow.com/a/58941536/674039 – wim Oct 15 '20 at 15:32
23

You can use __file__ to get the path to the package, like this:

import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()
Armster
  • 772
  • 1
  • 9
  • 25
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
17

To provide a solution working today. Definitely use this API to not reinvent all those wheels.

A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:

from pkg_resources import resource_filename, Requirement

path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.

from pkg_resources import resource_stream, Requirement

vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")

Package Discovery and Resource Access using pkg_resources

zezollo
  • 4,606
  • 5
  • 28
  • 59
Sascha Gottfried
  • 3,303
  • 20
  • 30
8

You need a name for your whole module, you're given directory tree doesn't list that detail, for me this worked:

import pkg_resources
print(    
    pkg_resources.resource_filename(__name__, 'data/data.txt')
)

Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you're gunna have to include the data/ prefix pretty much no matter what. You can use os.path.join('data', 'data.txt) if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.

ThorSummoner
  • 16,657
  • 15
  • 135
  • 147
  • https://docs.python.org/3.6/distutils/setupscript.html#writing-the-setup-script > Note that any pathnames (files or directories) supplied in the setup script should be written using the Unix convention, i.e. slash-separated. The Distutils will take care of converting this platform-neutral representation into whatever is appropriate on your current platform before actually using the pathname. This makes your setup script portable across operating systems, which of course is one of the major goals of the Distutils. In this spirit, all pathnames in this document are slash-separated. – Johann Chang Mar 26 '18 at 07:55
6

I think I hunted down an answer.

I make a module data_path.py, which I import into my other modules containing:

data_path = os.path.join(os.path.dirname(__file__),'data')

And then I open all my files with

open(os.path.join(data_path,'filename'), <param>)
Jacob Lyles
  • 9,920
  • 7
  • 32
  • 30
  • 2
    This will fail to work when the resource is in an archive distribution (such as a zipped egg). Prefer something like that: `pkg_resources.resource_string('pkg_name', 'data/file.txt')` – ankostis Jan 01 '14 at 01:04
  • @ankostis setuptools is clever enough to extract the archive if it detects that you used `__file__` somewhere. In my case I use a library which really wants paths and not streams. Of course I could write the files temporarily to disk but being lazy I just use setuptools's feature. – letmaik Apr 24 '14 at 08:35