101

How do control what files are included in a wheel? It appears MANIFEST.in isn't used by python setup.py bdist_wheel.

UPDATE:

I was wrong about the difference between installing from a source tarball vs a wheel. The source distribution includes files specified in MANIFEST.in, but the installed package only has python files. Steps are needed to identify additional files that should be installed, whether the install is via source distribution, egg, or wheel. Namely, package_data is needed for additional package files, and data_files for files outside your package like command line scripts or system config files.

Original Question

I have a project where I've been using python setup.py sdist to build my package, MANIFEST.in to control the files included and excluded, and pyroma and check-manifest to confirm my settings.

I recently converted it to dual Python 2 / 3 code, and added a setup.cfg with

[bdist_wheel]
universal = 1

I can build a wheel with python setup.py bdist_wheel, and it appears to be a universal wheel as desired. However, it doesn't include all of the files specified in MANIFEST.in.

What gets installed?

I dug deeper, and now know more about packaging and wheel. Here's what I learned:

I upload two package files to the multigtfs project on PyPi:

  • multigtfs-0.4.2.tar.gz - the source tar ball, which includes all the files in MANIFEST.in.
  • multigtfs-0.4.2-py2.py3-none-any.whl - The binary distribution in question.

I created two new virtual environments, both with Python 2.7.5, and installed each package (pip install multigtfs-0.4.2.tar.gz). The two environments are almost identical. They have different .pyc files, which are the "compiled" Python files. There are log files which record the different paths on disk. The install from the source tar ball includes a folder multigtfs-0.4.2-py27.egg-info, detailing the installation, and the wheel install has a multigtfs-0.4.2.dist-info folder, with the details of that process. However, from the point of view of code using the multigtfs project, there is no difference between the two installation methods.

Explicitly, neither has the .zip files used by my test, so the test suite will fail:

$ django-admin startproject demo
$ cd demo
$ pip install psycopg2  # DB driver for PostGIS project
$ createdb demo         # Create PostgreSQL database
$ psql -d demo -c "CREATE EXTENSION postgis" # Make it a PostGIS database 
$ vi demo/settings.py   # Add multigtfs to INSTALLED_APPS,
                        # Update DATABASE to set ENGINE to django.contrib.gis.db.backends.postgis
                        # Update DATABASE to set NAME to test
$ ./manage.py test multigtfs.tests  # Run the tests
...
IOError: [Errno 2] No such file or directory: u'/Users/john/.virtualenvs/test/lib/python2.7/site-packages/multigtfs/tests/fixtures/test3.zip'

Specifying additional files

Using the suggestions from the answers, I added some additional directives to setup.py:

from __future__ import unicode_literals
# setup.py now requires some funky binary strings
...
setup(
    name='multigtfs',
    packages=find_packages(),
    package_data={b'multigtfs': ['test/fixtures/*.zip']},
    include_package_data=True,
    ...
)

This installs the zip files (as well as the README) to the folder, and tests now run correctly. Thanks for the suggestions!

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
jwhitlock
  • 4,572
  • 4
  • 39
  • 49
  • Which files are missing exactly? – rje Sep 18 '14 at 15:19
  • Any non-Python files, such as documentation or test fixtures. My application include some .zip files used in tests, which some may argue aren't needed in a binary distribution. Others may have non-Python files that are required at runtime. – jwhitlock Sep 18 '14 at 17:21
  • I find hard to understand your question. How is the setup.py supposed to include files in a wheel (The tag description is empty, so I don't know what do you refer to)? – llrs Sep 19 '14 at 09:31
  • wheel is a built-package format for Python, and is often preferred over the .egg format if both are available. To build a wheel, you run `python setup.py bdist_wheel`. See the docs at http://wheel.readthedocs.org (which don't answer my question), and http://pythonwheels.com. – jwhitlock Sep 21 '14 at 03:37
  • 3
    The magic combination is to specify the files using `MANIFEST.in`, then add `include_package_data=True` to setup.py. – rspeed Aug 02 '17 at 05:07

8 Answers8

55

Have you tried using package_data in your setup.py? MANIFEST.in seems targetted for python versions <= 2.6, I'm not sure if higher versions even look at it.

After exploring https://github.com/pypa/sampleproject, their MANIFEST.in says:

# If using Python 2.6 or less, then have to include package data, even though
# it's already declared in setup.py
include sample/*.dat

which seems to imply this method is outdated. Meanwhile, in setup.py they declare:

setup(
    name='sample',
    ...
    # If there are data files included in your packages that need to be
    # installed, specify them here.  If using Python 2.6 or less, then these
    # have to be included in MANIFEST.in as well.
    include_package_data=True,
    package_data={
        'sample': ['package_data.dat'],
    },
    ...
)

(I'm not sure why they chose a wildcard in MANIFEST.in and a filename in setup.py. They refer to the same file)

Which, along with being simpler, again seems to imply that the package_data route is superior to the MANIFEST.in method. Well, unless you have to support 2.6 that is, in which case my prayers go out to you.

rjurney
  • 4,824
  • 5
  • 41
  • 62
vgel
  • 3,225
  • 1
  • 21
  • 35
  • Great answer, that's indeed the culprit. Here is also a [nice article about this issue](http://blog.ionelmc.ro/2015/02/24/the-problem-with-packaging-in-python/). – gaborous Dec 07 '15 at 12:22
  • 1
    This answer as it was no longer works in Python 3.8. The answer below works. I edited it to add `include_package_data=True` so it will actually work with late model Python. – rjurney May 26 '21 at 15:44
  • @rjurney Thanks for the edit! Does it still work with 3.0 – vgel May 26 '21 at 16:56
  • Thank you! This answer is so useful. I was so confused by the fact that `MANIFEST.in` wasn't doing anything. This solved my problems immediately. – GreyHope Jan 19 '22 at 16:22
49

Before you make any changes in MANIFEST.in or setup.py you must remove old output directories. Setuptools is caching some of the data and this can lead to unexpected results.

rm -rf build *.egg-info

If you don't do this, expect nothing to work correctly.

Now that is out of the way.

  1. If you are building a source distribution (sdist) then you can use any method below.

  2. If you are building a wheel (bdist_wheel), then include_package_data and MANIFEST.in are ignored and you must use package_data and data_files.

INCLUDE_PACKAGE_DATA

This is a good option, but bdist_wheel does not honor it.

setup(
    ...
    include_package_data=True
)

# MANIFEST.in
include package/data.json

DATA_FILES for non-package data

This is most flexible option because you can add any file from your repo to a sdist or bdist_wheel

setup(
    ....
    data_files=[
        ('output_dir',['conf/data.json']),
    ]
    # For sdist, output_dir is ignored!
    #
    # For bdist_wheel, data.json from conf dir in root of your repo 
    # and stored at `output_dir/` inside of the sdist package.
)

PACKAGE_DATA for non-python files inside of the package

Similar to above, but for a bdist_wheel let's you put your data files inside of the package. It is identical for sdist but has more limitations than data_files because files can only source from your package subdir.

setup(
    ...
    package_data={'package':'data.json'},
    # data.json must be inside of your actual package
)
cmcginty
  • 113,384
  • 42
  • 163
  • 163
  • 1
    Can you add an example of using a `glob` pattern? I'm guessing that for `data_files` this tuple would work: `('output_dir': ['conf/*.json'])` – piRSquared Jun 15 '18 at 19:25
  • 1
    @piRSquared globbing is not supported directly but you can use examples from other answers here: `glob('conf/*.json')` – cmcginty Jun 15 '18 at 19:36
  • 1
    data_files format is incorrect, should be `,` not `:` as in: `data_files=[('my_data', ['data/data_file'])],` [Reference doc](https://packaging.python.org/guides/distributing-packages-using-setuptools/#data-files) I would have edited but edits need to be 6 characters... – Andrew Fraser Mar 06 '19 at 10:16
  • @AndrewFraser .. fixed now. – cmcginty Mar 07 '19 at 19:13
  • "INCLUDE_PACKAGE_DATA is a good option, but bdist_wheel does not honor it" ? `include pkg/test/*.py` in `MANIFEST.in` works fine (in setuptools 45.2.0). – denis Feb 21 '20 at 14:56
  • 2
    Comment on "INCLUDE_PACKAGE_DATA is a good option, but bdist_wheel does not honor it": This is NOT entirely true and should be clarified in the answer: If the data files are contained in one of the package subdirectories and `include_package_data=True` and they are listed in `Manifest.in` then they will be included. – Stefan D. Sep 07 '20 at 14:43
  • @denis, that's because `*.py` files are probably not considered "package data". I say "probably" because who knows what's really going on? :-) – llude Sep 21 '20 at 22:38
  • I would add to @StefanD.'s comment that for me, `include_package_data=True` is necessary with `bdist_wheel` (not sure why, but if I omit it or set it to False, my data is not packaged in the wheel). – cjauvin Feb 01 '21 at 17:33
  • This is the working answer, the answer with more votes is not the right one in Python 3.8 for sure. – rjurney May 26 '21 at 15:44
28

You can use package_data and data_files in setup.py to specify additional files, but they are ridiculously hard to get right (and buggy).

An alternative is to use MANIFEST.in and add include_package_data=True in setup() of your setup.py as indicated here.

With this directive, the MANIFEST.in will be used to specify the files to include not only in source tarball/zip, but also in wheel and win32 installer. This also works with any python version (i tested on a project from py2.6 to py3.6).

UPDATE 2020: it seems the MANIFEST.in is not honored anymore by the wheel in Python 3, although it still is in the tar.gz, even if you set include_package_data=True.

Here is how to fix that: you need to specify both include_package_data and packages.

If your Python module is inside a folder "pymod", here's the adequate setup:

setup( ...
    include_package_data = True,
    packages = ['pymod'],
)

If your python scripts are at the root, use:

setup( ...
    include_package_data = True,
    packages = ['.'],
)

Then you can open your .whl file with a zip archival software such as 7-zip to check that all the files you want are indeed inside.

gaborous
  • 15,832
  • 10
  • 83
  • 102
  • 4
    This should be the current accepted answer! Using `package_data=...` as in the other answer is fraught with peril (read the links, and the links behind the links) – matt wilkie Oct 06 '17 at 19:12
  • With NumPy's take on `setup` in `numpy.distutils.core`, I cannot get wheels to work with `include_package_data=True`. It only listens to `package_data`. – llude Sep 22 '20 at 22:36
  • 1
    (1) Wheels *do* (despite the docs) respect the combination of MANIFEST.in plus include_package_data = True, _however_ (2) this only applies to 'package data' aka things that sit in the directories of the packages rather than e.g the project root – Brad Solomon Oct 28 '20 at 00:12
  • This worked for me with `python 3.6.9` and `pip 21.01`. I was using `packages=find_packages()` in `setup()` and changed it to `packages=find_packages() + ['.']`, as well as `include_package_data=True`. – Paul P Mar 01 '21 at 16:16
  • @BradSolomon : Can you explain the second point better, like how to include files in the root directory if I am using the `src/layout`? Also, I've noticed projects such as **jupyter-notebook**, **black editor** not using the `MANIFEST.in` structure for including data files and using a complicated `package_data` structure [find_package_data function in setupbase which is imported in setup.py](https://github.com/jupyter/notebook/blob/master/setupbase.py), why are they doing so? whereas a project like **django** is using regular `MANIFEST.in`. Also, is it better to switch to a tool like `flit`? – aspiring1 Apr 17 '21 at 06:26
12

You can specify extra files to install using the data_files directive. Is that what you're looking for? Here's a small example:

from setuptools import setup
from glob import glob

setup(
    name='extra',
    version='0.0.1',
    py_modules=['extra'],
    data_files=[
        ('images', glob('assets/*.png')),
    ],
)
Miki Tebeka
  • 13,428
  • 4
  • 37
  • 49
  • 6
    That looks very promising, but after 2 hours I was unable to get data_files or package_files working. Do you know of any projects using these features that I could look to for working code? – jwhitlock Sep 21 '14 at 18:55
  • +1 Using python 3.8.5. This is the only answer so far that actually worked for me. – arielf Apr 29 '21 at 01:22
  • I can see the file being effectively put in the .whl, but I don't manage to access it from the code. Any suggestion please? – ciurlaro Mar 24 '22 at 16:04
5

include_package_data is the way to go, and it works for sdist and wheels.

However you have to do it right, and it took me months to figure this out, so here is what I learned.

The trick is essentially given in the name of the option include_PACKAGE_data: The data files need to be in a package subfolder

If and only if

  • include_package_data is True
  • the data files are listed in MANIFEST.in (*see also my note at the end about setuptools_scm)
  • and the data files are under a package directory

then the data files will be included.

Working Example:

Given the project has the following structure and files:

|- MANIFEST.in
|- setup.cfg
|- setup.py
|
\---foo
    |- __init__.py
    |
    \---data
         - example.png

And the following configuration:

Manifest.in:

recursive-include foo/data *

setup.py

import setuptools

setuptools.setup()

setup.cfg

[metadata]
name = wheel-data-files-example
url = www.example.com
maintainer = None
maintainer_email = none@example.com

[options]
packages =
    foo
include_package_data = True

sdist packages and your wheels you build will contain the example.png datafile as well.

(of course, instead of setup.cfg the config can also be directly specified in setup.py. But this is not relevant for the example.)

Update: For src layout projects

This should also work for projects that use a src layout, looking like this:

|- MANIFEST.in
|- setup.cfg
|- setup.py
|
\---src
    |
    \---foo
        |- __init__.py
        |
        \---data
             - example.png

To make it work, tell setuptools about the src directory using package_dir:

setup.cfg

[metadata]
name = wheel-data-files-example
url = www.example.com
maintainer = None
maintainer_email = none@example.com

[options]
packages =
    foo
include_package_data = True
package_dir =
    =src

And in the manifest adjust the path:

Manifest.in:

recursive-include src/foo/data *

Note: No Manifest.in necessary if you use setuptools_scm

If you happen to use setuptools and add the setuptools_scm plugin (on pypi), then you don't need to manage a Manifest.in file. Instead setuptools_scm will take care that all files that are tracked by git are added in the package.

So for this case the rule for if or if not a file is added to the sdist/wheel is: If and only if

  • include_package_data is True
  • the file is tracked by git (or another setuptools_scm supported tool)
  • and the data files are under a package directory

then the data files will be included.

Stefan D.
  • 393
  • 3
  • 8
  • I tried this approach but had no luck getting `data` into the wheel. I suspect it's my `src`-based layout and the extra layer of indirection it introduces. I've now moved my `data` directory inside the package directory as suggested, but got rid of `MANIFEST.in` and `include_package_data`, using `package_data={'package': ['data/specific_file']}` instead. All is golden for both `sdist` and `bdist_wheel`. Good luck using globs with all those implicit subdirectories involved though... :-) – llude Sep 21 '20 at 23:02
  • I updated my answer and it should also work for projects with src layout now. – Stefan D. Sep 22 '20 at 07:32
  • Fantastic! I've copied your example package verbatim to my environment and it works beautifully. That allowed me to figure out why my actual package behaves differently... I forgot that I'm using `numpy.distutils.core.setup` to build a Fortran extension along the way, but then I randomly had to import `setuptools.setup` as well in order to get the `bdist_wheel` command to work at all. It builds the wheel, but insists on using `package_data`. This cargo-culty hacky-sacky route to get to a working Python package makes me want to pull out the few hairs I have left :-D – llude Sep 22 '20 at 16:26
  • I think I messed up my `Manifest.in` file, but I figured out that adding `[options.package_data]` `* =` `*.png` works as well and you don't polute your repository with *yet another file*. – Boris Verkhovskiy Feb 26 '21 at 20:37
0

I had config/ directory with JSON files in it, which I needed to add to the wheel package. So, I've added these lines to MANIFEST.in:

recursive-include config/ *.json

The following directive to setup.py:

setup(
 ...
 include_package_data=True,
)

And nothing worked. Until I've created an empty file called __init__.py inside config/ directory.

(Python 3.6.7, wheel 3.6.7, setuptools 39.0.1)

Michael Spector
  • 36,723
  • 6
  • 60
  • 88
0

To add files directly into the top level of a wheel (and not under a folder inside the wheel) simply use Poetry. Create a pyproject.toml with:

poetry init

Port your dependencies with:

cat requirements.txt | xargs poetry add

Add a line like this in your pyproject.toml

include = ["Yourfile"]

Then run:

poetry build

Note: IntelliJ products make it ease and fast to browse your wheels with this plugin.

0

For higher versions of python, I referred official documentation which resolved the issue. https://setuptools.pypa.io/en/latest/userguide/datafiles.html

project_root_directory
├── setup.py        # and/or setup.cfg, pyproject.toml
└── src
    └── mypkg
        ├── __init__.py
        ├── data1.rst
        ├── data2.rst
        ├── data1.txt
        └── data2.txt

Include files using package_data instead of MANIFEST.in

from setuptools import setup, find_packages
setup(
    # ...,
    packages=find_packages(where="src"),
    package_dir={"": "src"},
    package_data={"mypkg": ["*.txt", "*.rst"]}
)
Prateek Mehta
  • 488
  • 1
  • 8
  • 15