191

When using setuptools, I can not get the installer to pull in any package_data files. Everything I've read says that the following is the correct way to do it. Can someone please advise?

setup(
   name='myapp',
   packages=find_packages(),
   package_data={
      'myapp': ['data/*.txt'],
   },
   include_package_data=True,
   zip_safe=False,
   install_requires=['distribute'],
)

where myapp/data/ is the location of the data files.

wim
  • 338,267
  • 99
  • 616
  • 750
cmcginty
  • 113,384
  • 42
  • 163
  • 163
  • 3
    I'm having the same problem... Manually specifying `data_files` solved the problem. But this is error-prone and does not "feel right" to me. Can someone verify that it is really *necessary* to duplicate the configuration in both `package_data` and `data_files`? – exhuma Nov 07 '11 at 12:42
  • 2
    https://github.com/wimglenn/resources-example Shows a modern setuptools project structure, which can correctly package data files into wheels and sdists using `pyproject.toml`. No `setup.py` file required. – wim Apr 10 '20 at 19:01
  • 2
    for the love of it i can't get any of the answers below to work and the comments above would need a complete rewrite of many of my projects. – Wolfgang Fahl Oct 26 '21 at 12:00

14 Answers14

376

I realize that this is an old question, but for people finding their way here via Google: package_data is a low-down, dirty lie. It is only used when building binary packages (python setup.py bdist ...) but not when building source packages (python setup.py sdist ...). This is, of course, ridiculous -- one would expect that building a source distribution would result in a collection of files that could be sent to someone else to built the binary distribution.

In any case, using MANIFEST.in will work both for binary and for source distributions.

mhucka
  • 2,143
  • 26
  • 41
larsks
  • 277,717
  • 41
  • 399
  • 399
  • 126
    I have been researching this issue for the past hour and have been trying many approaches. As you say, `package_data` works for `bdist` and not `sdist`. **However**, `MANIFEST.in` works for `sdist`, but *not* for `bdist`! Therefore, the best I have been able to come up with is to include both `package_data` and `MANIFEST.in` in order to accommodate both `bdist` and `sdist`. – Wesley Baugh Mar 05 '13 at 00:41
  • 8
    I found another to support @WesleyBaugh. In http://stackoverflow.com/a/2969087/261718, Use `MANIFEST.in` for files you won't install, like documentation, and `package_data` for files you use that aren't Python code (like an image or template). – Drake Guan Nov 26 '13 at 02:33
  • 3
    Ran into this today. I get that they might not want to change the behaviour, but it should *atleast* be mentioned in the docs. – ffledgling Nov 23 '15 at 10:00
  • 15
    I am using sdist, and had to include both `MANIFEST.in` *and* `package_data`. It seems that `MANIFEST.in` controls what is included in the distribution, and package_data controls what subsequently gets copied into the site_packages dir during installation. Confusingly, paths in `MANIFEST.in` are relative to the location of setup.py, and `package_data` is relative to the individual packages (e.g. modules) root. – Edward Newell Jul 06 '16 at 22:59
  • 10
    "Changed in version 2.7: All the files that match package_data will be added to the MANIFEST file if no template is provided. See Specifying the files to distribute." [from distutils](https://docs.python.org/2/distutils/setupscript.html#installing-package-data). So, you'll only see the behaviour of files in `package_data` being automatically included in the ZIP *if you have no existing MANIFEST.in file*, and only if you're using 2.7+. – Johnus Oct 19 '16 at 23:17
  • 4
    It is safe to use `package_data` with `setuptools`: http://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files. Files are effectively included with both binary and source distributions, and can be accessed conveniently using the ResourceManager API described on the same page. See also https://stackoverflow.com/a/14211600/4716370. – Antoine Cotten May 31 '17 at 18:07
  • 79
    Seriously, I feel like this ticket is a group therapy session for folks using setuptools and discovering just what a horrid place they have found themselves in life. – Matt Joyce Jan 30 '18 at 16:22
  • 1
    Thanks for this. For anyone else arriving here tearing their hair out about missing package data files: if you're distributing via a remote git repo, _make sure your data file is included in git_. Mine was ignored so it wasn't in the remote repo; had nothing to do with setuptools or manifests. – Beej Apr 02 '18 at 20:45
  • 2
    Using Python 3.6, `package_data` still doesn't work for me if I just run `python setup.py install`. Adding a `MANIFEST.in` solved the problem for me... – stefanbschneider Jun 29 '18 at 18:58
  • 3
    I found that sometimes the changes to `MANIFEST.in` doesn't take effect unless I delete the egg-info folder. These are SO DAMN confusing! – Jason May 07 '19 at 10:42
  • I am using a `MANIFEST.in` file and the `include_package_data=True` option to build a `sdist`. The extra files only appear in my destination environment when I run `python setup.py install_lib` as mentioned [here](https://pythonhosted.org/an_example_pypi_project/setuptools.html#using-setup-py) – jeschwar Jun 10 '19 at 19:53
  • Python 3.6 behavior I've seen: the `MANIFEST.in` specifies what files to include in the source distribution (sdist). `setup.py install` automatically adds the sdist python files to site-packages. But, the `include_package_data` flag controls whether non-python files distributed in the sdist AND existing in a package (directory with `__init__.py`) install to the site-packages location. So in order for non-python files to install with your code they need to a) be in the sdist (controlled by `MANIFEST.in`) and b) exist inside an installable python package. Otherwise you need to use `data_files` – sahibeast Jun 22 '20 at 15:39
  • There is also an argument that it is better to use `setuptools_scm` rather than a MANIFEST.in file - https://www.remarkablyrestrained.com/python-setuptools-manifest-in/ – wesinat0r Sep 27 '20 at 18:39
  • Python 3.8, setuptools 49.2.0 using `[options.package_data]` only for bdist. I was missing was that the `path/to/data/*` needed to be relative to the root of the package. Not `src/package/path/to/data/*` not `package/path/to/data/*`. Files would add automatically with `pip install .` but if I wanted to remove them (commenting out the `[options.package_data]` not an explicit deny), I would need to delete the `build` and `package.egg-info` files. The `package.egg-info/SOURCES.txt` shows the files when they are properly added. No `MANIFEST.in` created no `__init__.py`, although works with both. – CarterKF Nov 24 '22 at 19:08
  • This old but popular answer should be updated, because _"In any case, using MANIFEST.in will work both for binary and for source distributions."_ is **not** true. As @Wesley Baugh states in his popular comment, both ways are needed to get both sdist and bdist. – Janos Dec 02 '22 at 23:19
42

I just had this same issue. The solution, was simply to remove include_package_data=True.

After reading here, I realized that include_package_data aims to include files from version control, as opposed to merely "include package data" as the name implies. From the docs:

The data files [of include_package_data] must be under CVS or Subversion control

...

If you want finer-grained control over what files are included (for example, if you have documentation files in your package directories and want to exclude them from installation), then you can also use the package_data keyword.

Taking that argument out fixed it, which is coincidentally why it also worked when you switched to distutils, since it doesn't take that argument.

Joe
  • 16,328
  • 12
  • 61
  • 75
  • 3
    My experience differs, I had the same problem without including the `include_package_data=True` entry. Only solution for me is to add an entry in Manifest as suggested above. Mind you I was using setuptools, maybe your version works with 'distribute'? – TimStaley Apr 17 '13 at 17:32
  • 5
    Actual reason why removing `include_package_data` solves problem is further in the [original text](https://pythonhosted.org/setuptools/setuptools.html#including-data-files) – *If using the setuptools-specific `include_package_data` argument, files specified by `package_data` will not be automatically added to the manifest unless they are listed in the `MANIFEST.in` file.* – Piotr Dobrogost Apr 04 '16 at 12:54
  • 1
    What is the use case of having `package_data` set to a non-empty list and specifying `include_package_data=False`? And why would you need to specify files twice in `MANIFEST.in` and `package_data`? – Herbert Apr 26 '18 at 08:19
23

Following @Joe 's recommendation to remove the include_package_data=True line also worked for me.

To elaborate a bit more, I have no MANIFEST.in file. I use Git and not CVS.

Repository takes this kind of shape:

/myrepo
    - .git/
    - setup.py
    - myproject
        - __init__.py
        - some_mod
            - __init__.py
            - animals.py
            - rocks.py
        - config
            - __init__.py
            - settings.py
            - other_settings.special
            - cool.huh
            - other_settings.xml
        - words
            - __init__.py
            word_set.txt

setup.py:

from setuptools import setup, find_packages
import os.path

setup (
    name='myproject',
    version = "4.19",
    packages = find_packages(),  
    # package_dir={'mypkg': 'src/mypkg'},  # didnt use this.
    package_data = {
        # If any package contains *.txt or *.rst files, include them:
        '': ['*.txt', '*.xml', '*.special', '*.huh'],
    },

#
    # Oddly enough, include_package_data=True prevented package_data from working.
    # include_package_data=True, # Commented out.
    data_files=[
#               ('bitmaps', ['bm/b1.gif', 'bm/b2.gif']),
        ('/opt/local/myproject/etc', ['myproject/config/settings.py', 'myproject/config/other_settings.special']),
        ('/opt/local/myproject/etc', [os.path.join('myproject/config', 'cool.huh')]),
#
        ('/opt/local/myproject/etc', [os.path.join('myproject/config', 'other_settings.xml')]),
        ('/opt/local/myproject/data', [os.path.join('myproject/words', 'word_set.txt')]),
    ],

    install_requires=[ 'jsonschema',
        'logging', ],

     entry_points = {
        'console_scripts': [
            # Blah...
        ], },
)

I run python setup.py sdist for a source distrib (haven't tried binary).

And when inside of a brand new virtual environment, I have a myproject-4.19.tar.gz, file, and I use

(venv) pip install ~/myproject-4.19.tar.gz
...

And other than everything getting installed to my virtual environment's site-packages, those special data files get installed to /opt/local/myproject/data and /opt/local/myproject/etc.

HeyWatchThis
  • 21,241
  • 6
  • 33
  • 41
18

include_package_data=True worked for me.

If you use git, remember to include setuptools-git in install_requires. Far less boring than having a Manifest or including all path in package_data ( in my case it's a django app with all kind of statics )

( pasted the comment I made, as k3-rnc mentioned it's actually helpful as is )

Community
  • 1
  • 1
vincent
  • 6,368
  • 3
  • 25
  • 23
9

Using setup.cfg (setuptools ≥ 30.3.0)

Starting with setuptools 30.3.0 (released 2016-12-08), you can keep your setup.py very small and move the configuration to a setup.cfg file. With this approach, you could put your package data in an [options.package_data] section:

[options.package_data]
* = *.txt, *.rst
hello = *.msg

In this case, your setup.py can be as short as:

from setuptools import setup
setup()

For more information, see configuring setup using setup.cfg files.

There is some talk of deprecating setup.cfg in favour of pyproject.toml as proposed in PEP 518, but this is still provisional as of 2020-02-21.

gerrit
  • 24,025
  • 17
  • 97
  • 170
  • This answer neglects to mention MANIFEST file so I think it won't actually work with sdists. Only with wheels. You should mention that. – wim Apr 09 '20 at 22:16
  • 1
    @wim I don't have enough understanding of MANIFEST, sdist, and wheels to answer that. This worked for me using `pip install`. – gerrit Apr 10 '20 at 08:58
  • That is because `pip install`, for a modern enough versions of pip, will first build a wheel and then install that. Still for many users this approach will silently fail to include package data. See the accepted answer and the comments under it for details about that. Using a `setup.cfg` is really just a different way of writing what the OP was already doing in `setup.py` in the question (by passing the `package_data` keyword argument in the call to `setup`), so I don't think this is particularly helpful as an answer *for this question*. It's not addressing the underlying problem at all. – wim Apr 10 '20 at 18:49
  • @wim For what it's worth, I'm really happy to finally find an example with `setup.cfg`, for once. Somehow, the official package tutorials advices to use `setup.cfg` instead of `setup.py`, but then, most of the answers apply to `setup.py`. And those answers are already all over the place and mostly half-broken and unclear, so I don't need to add another unknown by trying to translate the options to `setup.cfg`. Sadly, this specific answer doesn't seem to work for my project. But it hopefully could help others. – Eric Duminil Mar 06 '23 at 07:23
6

I had the same problem for a couple of days but even this thread wasn't able to help me as everything was confusing. So I did my research and found the following solution:

Basically in this case, you should do:

from setuptools import setup

setup(
   name='myapp',
   packages=['myapp'],
   package_dir={'myapp':'myapp'}, # the one line where all the magic happens
   package_data={
      'myapp': ['data/*.txt'],
   },
)

The full other stackoverflow answer here

Fabio Veronese
  • 7,726
  • 2
  • 18
  • 27
moctarjallo
  • 1,479
  • 1
  • 16
  • 33
6

Update: This answer is old and the information is no longer valid. All setup.py configs should use import setuptools. I've added a more complete answer at https://stackoverflow.com/a/49501350/64313


I solved this by switching to distutils. Looks like distribute is deprecated and/or broken.

from distutils.core import setup

setup(
   name='myapp',
   packages=['myapp'],
   package_data={
      'myapp': ['data/*.txt'],
   },
)
cmcginty
  • 113,384
  • 42
  • 163
  • 163
  • 2
    distribute isn't deprecated, it is _replacing_ distutils. I don't know why you were having the problem, but that's not the reason. – agf Sep 22 '11 at 23:23
  • 1
    That was the response I got from IRC, so who do I believe? If you have a working example using distribute I would appreciate then. – cmcginty Sep 23 '11 at 09:51
  • 6
    clarification: distribute is meant to replace setuptools, both are built on top of distutils. distutils itself will eventually be replaced by a new package, called "distutils2" in python2 and "packaging" in python3 – Kevin Horn Jun 14 '12 at 15:28
  • 1
    Switching to distutils resolved my issue where `include_package_data=True` was not being honored. So with that setting you only need MANIFEST.in - no need to duplicate your file list in the `package_data` setting. – Daniel Sokolowski Aug 21 '12 at 22:28
5

I found this post while stuck on the same problem.

My experience contradicts the experiences in the other answers. include_package_data=True does include the data in the bdist! The explanation in the setuptools documentation lacks context and troubleshooting tips, but include_package_data works as advertised.

My setup:

  • Windows / Cygwin
  • git version 2.21.0
  • Python 3.8.1 Windows distribution
  • setuptools v47.3.1
  • check-manifest v0.42

Here is my how-to guide.

How-to include package data

Here is the file structure for a project I published on PyPI. (It installs the application in __main__.py).

├── LICENSE.md
├── MANIFEST.in
├── my_package
│   ├── __init__.py
│   ├── __main__.py
│   └── _my_data          <---- folder with data
│       ├── consola.ttf   <---- data file
│       └── icon.png      <---- data file
├── README.md
└── setup.py

Starting point

Here is a generic starting point for the setuptools.setup() in setup.py.

setuptools.setup(
    ...
    packages=setuptools.find_packages(),
    ...
)

setuptools.find_packages() includes all of my packages in the distribution. My only package is my_package.

The sub-folder with my data, _my_data, is not considered a package by Python because it does not contain an __init__.py, and so find_packages() does not find it.

A solution often-cited, but incorrect, is to put an empty __init__.py file in the _my_data folder.

This does make it a package, so it does include the folder _my_data in the distribution. But the data files inside _my_data are not included.

So making _my_data into a package does not help.

The solution is:

  • the sdist already contains the data files
  • add include_package_data=True to include the data files in the bdist as well

Experiment (how to test the solution)

There are three steps to make this a repeatable experiment:

$ rm -fr build/ dist/ my_package.egg-info/
$ check-manifest
$ python setup.py sdist bdist_wheel

I will break these down step-by-step:

  1. Clean out the old build:
$ rm -fr build/ dist/ my_package.egg-info/
  1. Run check-manifest to be sure MANIFEST.in matches the Git index of files under version control:
$ check-manifest

If MANIFEST.in does not exist yet, create it from the Git index of files under version control:

$ check-manifest --create

Here is the MANIFEST.in that is created:

include *.md
recursive-include my_package *.png
recursive-include my_package *.ttf

There is no reason to manually edit this file.

As long as everything that should be under version control is under version control (i.e., is part of the Git index), check-manifest --create does the right thing.

Note: files are not part of the Git index if they are either:

  • ignored in a .gitignore
  • excluded in a .git/info/exclude
  • or simply new files that have not been added to the index yet

And if any files are under version control that should not be under version control, check-manifest issues a warning and specifies which files it recommends removing from the Git index.

  1. Build:
$ python setup.py sdist bdist_wheel

Now inspect the sdist (source distribution) and bdist_wheel (build distribution) to see if they include the data files.

Look at the contents of the sdist (only the relevant lines are shown below):

$ tar --list -f dist/my_package-0.0.1a6.tar.gz
my_package-0.0.1a6/
...
my_package-0.0.1a6/my_package/__init__.py
my_package-0.0.1a6/my_package/__main__.py
my_package-0.0.1a6/my_package/_my_data/
my_package-0.0.1a6/my_package/_my_data/consola.ttf <-- yay!
my_package-0.0.1a6/my_package/_my_data/icon.png    <-- yay!
...

So the sdist already includes the data files because they are listed in MANIFEST.in. There is nothing extra to do to include the data files in the sdist.

Look at the contents of the bdist (it is a .zip file, parsed with zipfile.ZipFile):

$ python check-whl.py
my_package/__init__.py
my_package/__main__.py
my_package-0.0.1a6.dist-info/LICENSE.md
my_package-0.0.1a6.dist-info/METADATA
my_package-0.0.1a6.dist-info/WHEEL
my_package-0.0.1a6.dist-info/entry_points.txt
my_package-0.0.1a6.dist-info/top_level.txt
my_package-0.0.1a6.dist-info/RECORD

Note: you need to create your own check-whl.py script to produce the above output. It is just three lines:

from zipfile import ZipFile
path = "dist/my_package-0.0.1a6-py3-none-any.whl" # <-- CHANGE
print('\n'.join(ZipFile(path).namelist()))

As expected, the bdist is missing the data files.

The _my_data folder is completely missing.

What if I create a _my_data/__init__.py? I repeat the experiment and I find the data files are still not there! The _my_data/ folder is included but it does not contain the data files!

Solution

Contrary to the experience of others, this does work:

setuptools.setup(
    ...
    packages=setuptools.find_packages(),
    include_package_data=True, # <-- adds data files to bdist
    ...
)

With the fix in place, redo the experiment:

$ rm -fr build/ dist/ my_package.egg-info/
$ check-manifest
$ python.exe setup.py sdist bdist_wheel

Make sure the sdist still has the data files:

$ tar --list -f dist/my_package-0.0.1a6.tar.gz
my_package-0.0.1a6/
...
my_package-0.0.1a6/my_package/__init__.py
my_package-0.0.1a6/my_package/__main__.py
my_package-0.0.1a6/my_package/_my_data/
my_package-0.0.1a6/my_package/_my_data/consola.ttf <-- yay!
my_package-0.0.1a6/my_package/_my_data/icon.png    <-- yay!
...

Look at the contents of the bdist:

$ python check-whl.py
my_package/__init__.py
my_package/__main__.py
my_package/_my_data/consola.ttf        <--- yay!
my_package/_my_data/icon.png           <--- yay!
my_package-0.0.1a6.dist-info/LICENSE.md
my_package-0.0.1a6.dist-info/METADATA
my_package-0.0.1a6.dist-info/WHEEL
my_package-0.0.1a6.dist-info/entry_points.txt
my_package-0.0.1a6.dist-info/top_level.txt
my_package-0.0.1a6.dist-info/RECORD

How not to test if data files are included

I recommend troubleshooting/testing using the approach outlined above to inspect the sdist and bdist.

pip install in editable mode is not a valid test

Note: pip install -e . does not show if data files are included in the bdist.

The symbolic link causes the installation to behave as if the data files are included (because they already exist locally on the developer's computer).

After pip install my_package, the data files are in the virtual environment's lib/site-packages/my_package/ folder, using the exact same file structure shown above in the list of the whl contents.

Publishing to TestPyPI is a slow way to test

Publishing to TestPyPI and then installing and looking in lib/site-packages/my_packages is a valid test, but it is too time-consuming.

Mike Gazes
  • 154
  • 1
  • 6
4

Ancient question and yet... package management of python really leaves a lot to be desired. So I had the use case of installing using pip locally to a specified directory and was surprised both package_data and data_files paths did not work out. I was not keen on adding yet another file to the repo so I ended up leveraging data_files and setup.py option --install-data; something like this

pip install . --install-option="--install-data=$PWD/package" -t package  
Mat Baker
  • 134
  • 5
4

Like others in this thread, I'm more than a little surprised at the combination of longevity and still a lack of clarity, BUT the best answer for me was using check-manifest as recommended in the answer from @mike-gazes

So, using just a setup.cfg and no setup.py and additional text and python files required in the package, what worked for me was keeping this in setup.cfg:

[options]
packages = find:
include_package_data = true

and updating the MANIFEST.in based on the check-manifest output:

include *.in
include *.txt
include *.yml
include LICENSE
include tox.ini
recursive-include mypkg *.py
recursive-include mypkg *.txt
3

Just remove the line:

include_package_data=True,

from your setup script, and it will work fine. (Tested just now with latest setuptools.)

Ian
  • 4,421
  • 1
  • 20
  • 17
  • It's crazy but it works both with `sdist` and `bdist_wheel`, have you checked why? – Szabolcs May 27 '20 at 07:51
  • 1
    I can indeed confirm that `sdist` ignores `package_data` when this is set. – Sander Steffann Jun 09 '20 at 14:13
  • At this point it's been months, but I seem to recall digging around in the code, getting lost twice, taking an EXTREMELY fine-toothed comb to the documentation, and gaining satisfaction. Apparently various sample scripts contain this flag and it causes no end of headaches. – Ian Jul 04 '20 at 09:04
3

Moving the folder containing the package data into to module folder solved the problem for me.

See this question: MANIFEST.in ignored on "python setup.py install" - no data files installed?

Community
  • 1
  • 1
exhuma
  • 20,071
  • 12
  • 90
  • 123
2

For a directory structure like:

foo/
├── foo
│   ├── __init__.py
│   ├── a.py
│   └── data.txt
└── setup.py

and setup.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from setuptools import setup


NAME = 'foo'
DESCRIPTION = 'Test library to check how setuptools works'
URL = 'https://none.com'
EMAIL = 'gzorp@bzorp.com'
AUTHOR = 'KT'
REQUIRES_PYTHON = '>=3.6.0'

setup(
    name=NAME,
    version='0.0.0',
    description=DESCRIPTION,
    author=AUTHOR,
    author_email=EMAIL,
    python_requires=REQUIRES_PYTHON,
    url=URL,
    license='MIT',
    classifiers=[
        'Programming Language :: Python',
        'Programming Language :: Python :: 3',
        'Programming Language :: Python :: 3.6',
    ],
    packages=['foo'],
    package_data={'foo': ['data.txt']},
    include_package_data=True,
    install_requires=[],
    extras_require={},
    cmdclass={},
)

python setup.py bdist_wheel works.

ksha
  • 2,007
  • 1
  • 19
  • 22
2

Starting with Setuptools 62.3.0, you can now use recursive wildcards ("**") to include a (sub)directory recursively. This way you can include whole folders with all their folders and files in it.

For example, when using a pyproject.toml file, this is how you include two folders recursively:

[tool.setuptools.package-data]
"ema_workbench.examples.data" = ["**"]
"ema_workbench.examples.models" = ["**"]

But you can also only include certain file-types, in a folder and all subfolders. If you want to include all markdown (.md) files for example:

[tool.setuptools.package-data]
"ema_workbench.examples.data" = ["**/*.md"]

It should also work when using setup.py or setup.cfg.

See https://github.com/pypa/setuptools/pull/3309 for the details.