1

There are many variations on this question I've found, but none seem to address my exact situation where

  1. The Python code for the project is contained in a single file
  2. There are required data directories that must be included with the installation

I am trying to create a pip-installable version of the following Python project, where the project consists of a single Python script myscript, plus two different data directories, data and test_data. data contains required data files that are sourced by myscript.py at runtime:

myscript
|
├── myscript
│   ├── data
│   ├── __init__.py
│   ├── myscript.py
│   └── test_data
├── LICENSE
├── README.md
└── setup.py

I've seen the recommendation to use py_modules in setup.py for similar issues, but that doesn't seem to allow for inclusion of the data directories upon install with pip, and I cannot find documentation that seems to cover this specific case.

I've also seen it recommended to just have users do a git clone of the repo instead of using pip, but it seems like a usability benefit to offer a way to pip-install with all dependencies while correctly adding the script to the PATH in an OS-dependent manner.

glarue
  • 530
  • 7
  • 20

2 Answers2

2

This looks like a pretty standard Python package. To reproduce your situation, I have the following layout:

$ find * -type f
LICENSE
myscript/myscript.py
myscript/data/data_file_1
myscript/test_data/test_data_file_1
myscript/__init__.py
README.md
setup.cfg
setup.py

To include the data directories, you can use the package_data option in your setup.cfg file (which is slowly replacing setup.py as the standard way to build Python packages).

We just need a stub setup.py:

from setuptools import setup

setup()

Everything else is in setup.cfg:

[metadata]
name = myscript
version = 1.0
description = An example for glarue
long_description = file: README.md
long_description_content_type = text/markdown
url = https://github.com/larsks/so-example-glarue
author = Lars Kellogg-Stedman
author_email = lars@oddbit.com

[options]
packages = find:

[options.package_data]
myscript = data/*, test_data/*

[options.entry_points]
console_scripts =
    myscript = myscript.myscript:main

The key part is the options.package_data` section, which, for each package, lists the glob patterns that should be included as part of the package.

If we build a source distribution from this:

$ python setup.py sdist
running sdist
...
Creating tar archive
removing 'myscript-1.0' (and everything under it)

We can see that the data files are included in the archive:

$ tar tf dist/myscript-1.0.tar.gz
myscript-1.0/
myscript-1.0/PKG-INFO
myscript-1.0/README.md
myscript-1.0/myscript/
myscript-1.0/myscript/__init__.py
myscript-1.0/myscript/data/
myscript-1.0/myscript/data/data_file_1
myscript-1.0/myscript/myscript.py
myscript-1.0/myscript/test_data/
myscript-1.0/myscript/test_data/test_data_file_1
myscript-1.0/myscript.egg-info/
myscript-1.0/myscript.egg-info/PKG-INFO
myscript-1.0/myscript.egg-info/SOURCES.txt
myscript-1.0/myscript.egg-info/dependency_links.txt
myscript-1.0/myscript.egg-info/entry_points.txt
myscript-1.0/myscript.egg-info/top_level.txt
myscript-1.0/setup.cfg
myscript-1.0/setup.py

The same holds true if you build a binary distribution with bdist.

You can find this entire example online at https://github.com/larsks/so-example-glarue.

larsks
  • 277,717
  • 41
  • 399
  • 399
  • Thank you for the very thorough answer. The solution for me was slightly more complicated, and actually involved info from a much older post from you [here](https://stackoverflow.com/a/14159430/3076552) Although following your example did result in the package data being included in the `sdist`, installing locally with `pip` did _not_ correctly install the files. To successfully install the files along with the script, I had to include a `MANIFEST.in` file (as per the answer from gelonida) and a corresponding `include_package_data = True` in `setup.cfg`. – glarue Jun 02 '20 at 15:22
1

I think you're missing the MANIFEST.in file

please look at following mini-project:

Directory structure

./MANIFEST.in
./mini/data/f1.txt
./mini/__init__.py
./mini/mini.py
./mini/test_data/f1.txt
./README.md
./setup.py

setup.py

from setuptools import setup

setup(name="mini",
      version="0.0.1",
      description="one file pip installable mod",
      long_description="long description",
      long_description_content_type="text/x-rst",
      classifiers=[
            "Development Status :: 3 - Alpha",
      ],
      keywords="sample",
      url="https://github.com/demo",
      author="me",
      author_email="me@my.email.com",
      license="MIT",
      packages=["mini"],
      scripts=[],
      entry_points={
          "console_scripts": [
            "mini = mini.mini:main",
            ]
      },
      install_requires=[],
      extras_require={},
      python_requires=">=3.4",
      setup_requires=[],
      tests_require=[],
      zip_safe=False,
      include_package_data=True,
      )

MANIFEST.in

include README.md
recursive-include mini/data/ *
recursive-include mini/test_data/ *

mini/mini.py

#!/usr/bin/env python

import os

MYPATH = os.path.realpath(os.path.dirname(__file__))

DATA_PATH = os.path.join(MYPATH, "data")
TEST_DATA_PATH = os.path.join(MYPATH, "test_data")

def main():
    print("I am the miniscript")
    with open(os.path.join(DATA_PATH, "f1.txt")) as fin:
        print("DATA FILE", fin.read())
    with open(os.path.join(TEST_DATA_PATH, "f1.txt")) as fin:
        print("TEST DATA FILE", fin.read())

main()

Try out with:

python -m setup sdist
tar tvfz dist/mini-0.0.1.tar.gz
gelonida
  • 5,327
  • 2
  • 23
  • 41