data_files differences between pip and setuptools

Question

I have a Python application that comes with a setup.py script and can be installed via Pip or setuptools. However, I'm finding some annoying differences between the two methods and I want to know the correct way of distributing data-files.

import glob
import setuptools

long_description = ''
setuptools.setup(
  name='creator-build',
  version='0.0.3-dev',
  description='Meta Build System for Ninja',
  long_description=long_description,
  author='Niklas Rosenstein',
  author_email='rosensteinniklas@gmail.com',
  url='https://github.com/creator-build/creator',
  py_modules=['creator'],
  packages=setuptools.find_packages('.'),
  package_dir={'': '.'},
  data_files=[
    ('creator', glob.glob('creator/builtins/*.crunit')),
  ],
  scripts=['scripts/creator'],
  classifiers=[
    "Development Status :: 5 - Production/Stable",
    "Programming Language :: Python",
    "Intended Audience :: Developers",
    "Topic :: Utilities",
    "Topic :: Software Development :: Libraries",
    "Topic :: Software Development :: Libraries :: Python Modules",
    ],
  license="MIT",
)

Using Pip, the files specified in data_files end up in sys.prefix + '/creator'.
Using setuptools (that is, running setup.py directly), the files end up in lib/python3.4/site-packages/creator_build-0.0.3.dev0-py3.4.egg/creator.

Ideally, I would like the files to always end up in the same location, independent from the installation method. I would also prefer the files to be put into the module directory (the way setuptools does it), but that could lead to problems if the package is installed as a zipped Python Egg.

How can I make sure the data_files end up in the same location with both installation methods? Also, how would I know if my module was installed as a zipped Python Egg and how can I load the data files then?

Torxed · Answer 1 · 2020-07-08T11:15:30.020

I've been asking around and the general consensus including the official docs is that:

Warning data_files is deprecated. It does not work with wheels, so it should be avoided.

Instead, everyone appears to be pointing towards include_package_data instead.
There's a drawback here in that it doesn't allow for including things outside of your src root. Which means, if creator is outside creator-build, it won't include it. Even package_data will have this limitation.

The only workaround, if your data files are outside of your source files (for instance, I'm trying to include examples/*.py for a lot of reasons we don't need to discuss), you can hot-swap them in, do the setup and then remove them.

import setuptools, glob, shutil

with open("README.md", "r") as fh:
    long_description = fh.read()

shutil.copytree('examples', 'archinstall/examples')

setuptools.setup(
    name="archinstall",
    version="2.0.3rc4",
    author="Anton Hvornum",
    author_email="anton@hvornum.se",
    description="Arch Linux installer - guided, templates etc.",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/Torxed/archinstall",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3.8",
        "License :: OSI Approved :: GNU General Public License v3 (GPLv3)",
        "Operating System :: POSIX :: Linux",
    ],
    python_requires='>=3.8',
    package_data={'archinstall': glob.glob('examples/*.py')},
)

shutil.rmtree('archinstall/examples')

This is at best ugly, but works.
My folder structure for reference is (in the git repo):

.
├── archinstall
│   ├── __init__.py
│   ├── lib
│   │   ├── disk.py
│   │   └── exceptions.py
│   └── __main__.py
├── docs
│   ├── logo.png
├── examples
│   ├── guided.py
│   └── minimal.py
├── LICENSE
├── profiles
│   ├── applications
│   │   ├── awesome.json
│   │   ├── gnome.json
│   │   ├── kde.json
│   │   └── postgresql.json
│   ├── desktop.py
│   ├── router.json
│   ├── webserver.json
│   └── workstation.json
├── README.md
└── setup.py

And this is the only way I can see how to include for instance my profiles as well as examples without moving them outside of the root of the repository (which I'd prefer not to do, as I want users to easily find them when navigating to the repo on github).

And one final note. If you don't mind polluting the src directory, in my case that's just archinstall. You could symlink in whatever you need to include instead of copying it.

cd archinstall
ln -s ../examples ./examples
ln -s ../profiles ./profiles

That way, when setup.py or pip installs it, they'll end up in the <package dir> as it's root.

data_files differences between pip and setuptools

1 Answers1