There are two different things you could be trying to do here:
- Treat the data files as part of your package, like the Python modules, and access them at runtime as if your package were a normal directory tree even if it isn't.
- Get the data files installed somewhere else at
pip install
time, to a location you can access normally.
Both are explained in the section on data files in the PyPA/setuptools
docs. I think you want the first one here, which is covered in the subsection on Accessing Data Files at Runtime:
Typically, existing programs manipulate a package’s __file__
attribute in order to find the location of data files. However, this manipulation isn’t compatible with PEP 302-based import hooks, including importing from zip files and Python Eggs. It is strongly recommended that, if you are using data files, you should use the ResourceManager API of pkg_resources
to access them. The pkg_resources
module is distributed as part of setuptools
, so if you’re using setuptools
to distribute your package, there is no reason not to use its resource management API. See also Accessing Package Resources for a quick example of converting code that uses __file__
to use pkg_resources
instead.
Follow that link, and you find what look like some crufty old PEAK docs, but that's only because they really are crufty old PEAK docs. There is a version buried inside the setuptools
docs that you may find easier to read and navigate once you manage to find it.
As it says, you could try
using get_data
(which will work inside an egg/zip) and then fall back to accessing a file (which will work when running from source), but you're better off using the wrappers in pkg_resources
. Basically, if your code was doing this:
path = os.path.join(__file__, 'Wordproject/WordProject/Repository/DataBank/', datathingy)
with open(path) as f:
for line in f:
do_stuff(line)
… you'll change it to this:
path = 'Wordproject/WordProject/Repository/DataBank/' + datathingy
f = pkg_resources.resource_stream(__name__, path)
for line in f:
do_stuff(line.decode())
Notice that resource_stream
files are always opened in binary mode. So if you want to read them as text, you need to wrap a TextIOWrapper
around them, or decode each line.