56

I have a big Python 3.7+ project and I am currently in the process of splitting it into multiple packages that can be installed separately. My initial thought was to have a single Git repository with multiple packages, each with its own setup.py. However, while doing some research on Google, I found people suggesting one repository per package: (e.g., Python - setuptools - working on two dependent packages (in a single repo?)). However, nobody provides a good explanation as to why they prefer such structure.

So, my question are the following:

  • What are the implications of having multiple packages (each with its own setup.py) on the same GitHub repo?
  • Am I going to face issues with such a setup?
  • Are the common Python tools (documentation generators, pypi packaging, etc) compatible with with such a setup?
  • Is there a good reason to prefer one setup over the other?
  • Please keep in mind that this is not an opinion-based question. I want to know if there are any technical issues or problems with any of the two approaches.

Also, I am aware (and please correct me if I am wrong) that setuptools now allow to install dependencies from GitHub repos, even if the GitHub URL of the setup.py is not at the root of the repository.

sinoroc
  • 18,409
  • 2
  • 39
  • 70
AstrOne
  • 3,569
  • 7
  • 32
  • 54
  • 2
    Advantages of separate packages: Some Github tools, like the wiki or the issues, will be able to be separated as well, and thus the information they handle would me more manageable. Also, if a user only needs one of the packages only, he or she does not need to download the other ones. – Jalo Jan 19 '19 at 11:16
  • 3
    @AstrOne really interested in what you come up with here. I am working on a project where we've had two separate, private packages with their own repos, but where one of the packages depends on the other. This has quickly made testing a bit of a nightmare. I figure we can either (a) rollout some good CI devops infrastructure or (b) put the packages in the same repo and consolidate the testing base. I'm partial to (b), for now, given that it seems like the quickest path and we're still early days, but very keen to hear what the best practices are. – aaron Jun 12 '19 at 18:47
  • 1
    Hello! I was just thinking that if interdependency of packages makes it beneficial to keep them in the same repository so much that users prefer to do so, then that is probably an issue with the ecosystem. My consideration is that I would expect packages from different authors being typically interdependent. And hence they can hardly ever be put into the same repo (not, without a high degree of collaboration). So if you experienced problems that still persist, they may best be brought up for a wide audience/PEP defining people? – brezniczky Aug 20 '19 at 15:28

4 Answers4

11

One aspect is covered here https://pip.readthedocs.io/en/stable/reference/pip_install/#vcs-support (Updated link: https://pip.pypa.io/en/stable/topics/vcs-support/)

In particular, if setup.py is not in the root directory you have to specify the subdirectory where to find setup.py in the pip install command.

So if your repository layout is:

  • pkg_dir/
    • setup.py # setup.py for package pkg
    • some_module.py
  • other_dir/
    • some_file
    • some_other_file

You’ll need to use pip install -e vcs+protocol://repo_url/#egg=pkg&subdirectory=pkg_dir.

mrajase
  • 47
  • 7
Teitur
  • 163
  • 1
  • 7
4

I am researching the same issue myself. PyPa documentation recommends the layout described in 'native' subdirectory of: https://github.com/pypa/sample-namespace-packages

I find the single package structure described below, very useful, see the discussion around testing the 'installed' version. https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure I think this can be extended to multiple packages. Will post as I learn more.

boriska
  • 171
  • 1
  • 8
4

"Best" approach? That's a matter of opinion, which is not the domain of SO. But here are a couple of justifications for creating separate packages:

  1. Package is functionally independent of the other packages in your project.
    That is, doesn't import from them and performs a function that could be useful to other developers. Extra points if the function this package performs is similar to packages already in PyPI. Extra points if the package has a stable API and clear documentation. Penalty points if package is a thin grab bag of unrelated functions that you factored out of multiple packages for ease of maintenance, but the functions don't have an unifying principle.
  2. The package is optional with respect to your main project, so there'd be cases where users could reasonably choose to skip installing it.
    Perhaps one package is a "client" and the other is the "server". Or perhaps the package provides OS-specific capabilities. Note that a package like this is not functionally independent of the main project and so does not qualify under the previous bullet point, but this would still be a good reason to separate it.

I agree with @boriska's point that the "single package" project structure is a maintenance convenience well worth striving for. But not (and this is just my opinion, I'm going to get downvoted for expressing it) at the expense of cluttering up the public package index with a large number of small packages that are never installed separately.

BobHy
  • 1,575
  • 10
  • 23
  • 1
    +1 for the "never installed seperately" - that's a really great point and a good way to reason about collapsing several tiny packages into one small package – Joon Apr 13 '21 at 09:43
1

The major problem I've with faced when splitting two interdependent packages into two repos came from CI and testing. Specifically branch protections.

Say you have package A and package B and you make some (breaking) changes in both. The automated tests for package A fail because they use the main branch of B (which is no longer compatible with the new version of A) so you can't merge B. And the same problem the other way around.

tldr:
After breaking changees automated tests on merge will fail because they use the main branch of the other repo. Making it impossible to merge.

Joep
  • 119
  • 1
  • 3