I want to structure a Python repo with multiple Spark applications, each one is a separate application. I want to be able to have some common packages which all other can use, and some packages which are standalone spark applications.
I need to be able to build each of the packages separately into a wheel file, both the common packages and the standalone spark applications.
Also I want to have test files for each of these packages separately.
Is the following structure a good practice?
root
├── common_package_a
│ ├── package_a_tests
│ ├── requirements.txt
│ ├── venv
│ ├── setup.py
├── common_package_b
│ ├── package_b_tests
│ ├── requirements.txt
│ ├── venv
│ ├── setup.py
│ .
│ .
│ .
├── spark_application_a
│ ├── spark_application_a_tests
│ ├── requirements.txt
│ ├── venv
│ ├── setup.py
├── spark_application_b
│ ├── spark_application_b_tests
│ ├── requirements.txt
│ ├── venv
│ ├── setup.py
I can't find a recommended structure for this goal, all examples of how to build a python project always have a single setup.py in the root dir, a single venv for the entire project.
I've looked at some questions similar to mine:
- https://discuss.python.org/t/how-to-best-structure-a-large-project-into-multiple-installable-packages/5404/2
- How do you organise a python project that contains multiple packages so that each file in a package can still be run individually?
Thanks!