I have two ML projects on Azure Databricks that work almost the same except that they are for different clients. Essentially I want to use some management system so I can share and reuse the same code across different projects. (i.e. python files that store helpful functions for feature engineering, Databricks notebooks that perform similar initial data preprocessing, some configuration files, etc.) At same time, if an update is made in the shared code, it needed to be sync with all the projects that use the code.
I know for Git we can use submodule to do this where we have common code stored in Repo C, and add it as a submodule to Repo A and Repo B. But the problem is that Azure Databricks doesn't support submodule. Also, it only supports working branch up to 200 MB, so I cannot do Monorepo (i.e. have all the code in one repository) either. I was thinking creating a package for shared Python files, but I also have a few core version of notebooks that I want to share which I don't think is possible to built as a package?
Is there any other ways that I can do this on Databricks so I can reuse the code and don't just copy and paste?