2

I have the following python project structure:

.
├ setup.py
├ doc
|   ├ file.css
|   ├ file.html
|   └ file.js
└ src
    ├ matlabsources
    |             └ <several folders architecture with .m and .slx files>
    └ mypythonpackage
        ├ __init__.py
        └ <several sub packages architecture with python files>

I want to add all the files in the doc folder to my whl distribution file.

setuptools.setup(
    name='myproject',
    author='me',
    packages=setuptools.find_packages(where='src', include=['packages*']),
    package_dir={'': 'src'},
    data_files ={'documentation': find_data_files('doc'), 'matlab': find_data_files('src/matlabsources')},
    include_package_data=True,
    install_requires=make_deps(REQS_FILENAME),
    python_requires='>= 2.7',  # Only compatible with Python 2.7.* and 3+
    use_scm_version={'version_scheme': simple_version},  # setuptools_scm: the blessed package to manage your versions by scm tags
    setup_requires=make_deps(SETUP_FILENAME),
    cmdclass=dict(bdist_egg=custom_bdist_egg, build=custom_build, activateIniGeneration=activateIniGeneration)
)


def find_data_files(directory):
    """
    Using glob patterns in ``package_data`` that matches a directory can
    result in setuptools trying to install that directory as a file and
    the installation to fail.

    This function walks over the contents of *directory* and returns a list
    of only filenames found.
    """

    strip = os.path.dirname(os.path.abspath(__file__))

    result = []
    for root, dirs, files in os.walk(directory):
        for filename in files:
          filename = os.path.join(root, filename)
          result.append(os.path.relpath(filename, strip))

    print("\n".join(result))
    return result

I get the following error:

error: can't copy 'documentation': doesn't exist or not a regular file

In my understanding, 'documentation' is the target directory, relative to sys.prefix, it is normal it does not exist.

I am building with the following command:

python setup.py bdist_wheel --universal

I also have this warning

warning: install_data: setup script did not provide a directory for 'documentation' -- installing right in 'build\bdist.win32\wheel\myproject-1.7.z_gfdc81e60.d20201112.data\data'

Which make me think I need further configuration to my setup.py for this to work

Where am I wrong ?

Milan
  • 1,547
  • 1
  • 24
  • 47
  • Why? Why do you want to add the doc to the _wheel_ file? Do you also want the doc files to be installed? Where do you want the doc files to be installed? Who is gonna access these files? How are the files gonna be accessed? -- I am asking, because it is an unusual question, and I am not sure what the end goal is. Knowing what the purpose is, will help give you a more useful answer. – sinoroc Nov 13 '20 at 09:00
  • Yes, I want the doc files to be installed, for this they have to be added to the wheel file right ? Or is there another way ? I simplified the example but they are also matlab files that are to be installed, those files are accessed from the python packages. The files should be placed at the same level than the sources packages. – Milan Nov 13 '20 at 14:31
  • I suggest you look into [_`package_data`_](https://setuptools.readthedocs.io/en/latest/userguide/datafiles.html) then. I summed it up [here](https://sinoroc.gitlab.io/kb/python/package_data.html). There are other duplicate questions on _StackOverflow_. If you show your directory structure, I can probably write down a real answer here. – sinoroc Nov 13 '20 at 16:09
  • Does this answer your question? [Is it possible to exclude data file sources and intermediary files from bdist?](https://stackoverflow.com/questions/54867650/is-it-possible-to-exclude-data-file-sources-and-intermediary-files-from-bdist) – sinoroc Nov 13 '20 at 16:14
  • Just to make sure, you want to keep this directory structure, but want the `doc` and `matlabsources` to be installed as subpackages of `mypythonpackage`. Is that right? So in the `site-packages` directory, you want to have them as `[...]/site-packages/mypythonpackage/doc` and `[...]/site-packages/mypythonpackage/matlabsources`? If not, then please edit the question to show the expected directory structure once installed. – sinoroc Nov 16 '20 at 16:01
  • So yes I want to keep this directory structure. For the `doc` I do not really care where they end up but it would be better under `site-packages`, however for the `matlabsources` yes, they need to be under the `site-packages` directory. – Milan Nov 16 '20 at 16:17
  • OK. Do you have `__init__.py` files in `matlabsources`? At least 1 at the root? If not would it be OK to add 1? – sinoroc Nov 16 '20 at 16:29
  • Sorry for the delay, I do not but I would not mind to add one – Milan Nov 16 '20 at 17:18

1 Answers1

3

Assuming a project directory structure such as:

myproject
├── doc
│   ├── alpha
│   │   ├── file.css
│   │   ├── file.html
│   │   └── file.js
│   ├── file.css
│   ├── file.html
│   └── file.js
├── MANIFEST.in
├── setup.cfg
├── setup.py
└── src
    ├── matlabsources
    │   ├── bravo
    │   │   ├── file.m
    │   │   └── file.slx
    │   ├── file.m
    │   └── file.slx
    └── mypythonpackage
        ├── charlie
        │   └── __init__.py
        └── __init__.py

With MANIFEST.in you can specify additional files to be added to the _source distribution (sdist).

recursive-include doc *.css
recursive-include doc *.html
recursive-include doc *.js

recursive-include src/matlabsources *.m
recursive-include src/matlabsources *.slx

The setuptools script setup.py should look like this:

#!/usr/bin/env python3

import setuptools

def _find_packages():
    packages = setuptools.find_packages(where='src')
    packages.append('mypythonpackage.doc')
    packages.append('matlabsources')
    return packages

def _main():
    setuptools.setup(
        # see 'setup.cfg'
        #
        packages=_find_packages(),
        include_package_data=True,
        package_dir={
            'mypythonpackage': 'src/mypythonpackage',
            'mypythonpackage.doc': 'doc',
            'matlabsources': 'src/matlabsources',
        },
    )


if __name__ == '__main__':
    _main()

This results in the sdist:

$ python3 -m tarfile -l dist/myproject-0.0.0.dev0.tar.gz 
myproject-0.0.0.dev0/ 
myproject-0.0.0.dev0/MANIFEST.in 
myproject-0.0.0.dev0/PKG-INFO 
myproject-0.0.0.dev0/doc/ 
myproject-0.0.0.dev0/doc/alpha/ 
myproject-0.0.0.dev0/doc/alpha/file.css 
myproject-0.0.0.dev0/doc/alpha/file.html 
myproject-0.0.0.dev0/doc/alpha/file.js 
myproject-0.0.0.dev0/doc/file.css 
myproject-0.0.0.dev0/doc/file.html 
myproject-0.0.0.dev0/doc/file.js 
myproject-0.0.0.dev0/myproject.egg-info/ 
myproject-0.0.0.dev0/myproject.egg-info/PKG-INFO 
myproject-0.0.0.dev0/myproject.egg-info/SOURCES.txt 
myproject-0.0.0.dev0/myproject.egg-info/dependency_links.txt 
myproject-0.0.0.dev0/myproject.egg-info/requires.txt 
myproject-0.0.0.dev0/myproject.egg-info/top_level.txt 
myproject-0.0.0.dev0/pyproject.toml 
myproject-0.0.0.dev0/setup.cfg 
myproject-0.0.0.dev0/setup.py 
myproject-0.0.0.dev0/src/ 
myproject-0.0.0.dev0/src/matlabsources/ 
myproject-0.0.0.dev0/src/matlabsources/bravo/ 
myproject-0.0.0.dev0/src/matlabsources/bravo/file.m 
myproject-0.0.0.dev0/src/matlabsources/bravo/file.slx 
myproject-0.0.0.dev0/src/matlabsources/file.m 
myproject-0.0.0.dev0/src/matlabsources/file.slx 
myproject-0.0.0.dev0/src/mypythonpackage/ 
myproject-0.0.0.dev0/src/mypythonpackage/__init__.py 
myproject-0.0.0.dev0/src/mypythonpackage/charlie/ 
myproject-0.0.0.dev0/src/mypythonpackage/charlie/__init__.py 

and in the wheel:

$ python3 -m zipfile -l dist/myproject-0.0.0.dev0-py3-none-any.whl 
File Name                                             Modified             Size
matlabsources/file.m                           2020-11-16 16:41:06            0
matlabsources/file.slx                         2020-11-16 16:41:06            0
matlabsources/bravo/file.m                     2020-11-16 16:41:18            0
matlabsources/bravo/file.slx                   2020-11-16 16:41:18            0
mypythonpackage/__init__.py                    2020-11-16 16:45:02           88
mypythonpackage/charlie/__init__.py            2020-11-16 16:55:22            0
mypythonpackage/doc/file.css                   2020-11-16 16:30:34            0
mypythonpackage/doc/file.html                  2020-11-16 16:30:34            0
mypythonpackage/doc/file.js                    2020-11-16 16:30:34            0
mypythonpackage/doc/alpha/file.css             2020-11-16 16:33:00            0
mypythonpackage/doc/alpha/file.html            2020-11-16 16:33:00            0
mypythonpackage/doc/alpha/file.js              2020-11-16 16:33:00            0
myproject-0.0.0.dev0.dist-info/METADATA        2020-11-16 17:03:32         1311
myproject-0.0.0.dev0.dist-info/WHEEL           2020-11-16 17:03:32           92
myproject-0.0.0.dev0.dist-info/top_level.txt   2020-11-16 17:03:32           30
myproject-0.0.0.dev0.dist-info/RECORD          2020-11-16 17:03:32         1499
sinoroc
  • 18,409
  • 2
  • 39
  • 70
  • This is what I have except the doc folder is not located under mypackage, but under src. I found a working solution which is instead to provide a dictionary of > I provide a List)> and it works. I do not understand why and it's not what is in the doc. – Milan Nov 16 '20 at 15:10
  • Actually I poorly read the doc as it is specified as a list of tuples contrary to package_data which is a dictionary. And this is why consistency is a thing – Milan Nov 16 '20 at 15:20
  • At the `packages=_find_packages()` for `packages.append('matlabsources')` line I receive error: package directory `'matlabsources'` does not exist, I added a `__init__.py` file but it does not change anything. – Milan Nov 16 '20 at 18:17
  • I managed to make the setup.py work in the way you did, but nothing to do, the matlab folder does not end up in the whl archive. – Milan Nov 16 '20 at 18:46
  • from reproducing your use case, it works, so I will investigate what is wrong in my own project – Milan Nov 16 '20 at 19:16
  • Is my project directory structure the same as yours, or not? – sinoroc Nov 16 '20 at 19:49
  • Good! You will notice though that the convention dictates that each project has 1 and only 1 top level importable package with the same name (canonicalized) of the project (the distributable package). But this project has 2 top level packages (`mypythonpackage`, and `matlabsources`) instead of 1, and none of the top level packages have the same name as the project (`myproject`). -- This is not a problem if this project stays private, but I would recommend avoiding this if the project is distributed publicly, on _PyPI_ for example. – sinoroc Nov 17 '20 at 08:17
  • Is this an issue if the matlabsources package contains only non python files ? – Milan Nov 17 '20 at 16:23
  • Yes, it could be a problem. It does not matter if there are Python files or not. The way it is laid out here, it is a _top level package_. So if another project happens to also have a top level importable package named `matlabsources` there will be an issue. Might be very unlikely to happen, but still could theoretically happen. -- That is why the convention makes sense: once you "_own_ " the project called `Something` on _PyPI_, then it is assumed that you also own the _top level package_ named `something`, and that no other project should have a `something` top level package. – sinoroc Nov 17 '20 at 16:34