2

The problem

I host a Git repository for the Matter Modeling Stack Exchange, called Modeling Matters. For example, to supplement the answer to question #9743 on the site, a folder called /9743 was made to include all input and output files described in the answer.

I host another repository called QC Bugs which contains a folder for every bug reported for each quantum chemistry software.

If there's a question asked at MMSE, about a bug reported in QC Bugs, and there's a folder containing all files associated with the bug report, then that folder belongs in both repositories.

The solution to similar questions

Questions about how to deal with this type of situation have been asked over and over again since within one year of the launch of Stack Overflow, and it seems that every single question's answers/comments contain a suggestion to use either git-submodule, git-subtree (an improvement over git-submodule), or git-subrepo (an improvement over git-submodule and git-subtree) or to use symbolic links:

The reasons why those solutions don't work for me

Making the shared folder a git-submodule is doable, but if I need to have 1000 shared folders like this, then it becomes unwieldy to have 1000 separate Git repositories. Also, even turning one shared folder that's used for this type of purpose, into its own Git repository, seems to be overkill when in theory a system could be in place such that we only have two Git repositories (rather than three), and the shared folder could get labeled as being "synced" with the corresponding folder in the other Git repository, so that if someone pushes a commit that changes a file in either folder, both folders will get updated (this would be particularly easy for a case like mine, where I'm the maintainer of both repositories, and therefore the "pull request" that appears for both repositories simultaneously, only has to be accepted once and then gets merged into both repositories provided that no merge conflicts prevent the pull request from happening). For those reasons, I don't even want to make one new Git repository for the common folders. You might be thinking about symbolic links already, which I will discuss in the next section.

git-subtree is supposed to be an improvement over git-submodule that fixes a lot of limitations of the latter, but after reading two very thorough and excellent articles about it:

it seems that this "overkill" of making each shared folder a Git repository, and the need for a third Git repository, would remain unchanged.

git-subrepo has been described as an "improved version of submodule and subtree is described in its README Documentation and seems to work the same way (requiring the "shared" folder to be a separate Git repository).

I didn't see git-cherry-pick mentioned in any of the above-mentioned SO threads, and it could certainly be used for one repository to "pick" folders from another repository, but it doesn't seem to accomplish the "synchronizing" that I'm trying to achieve.

Another inelegant solution

One solution I considered was to put the two Git repositories inside a bigger Git repository, and to use a symbolic link to connect the two. Symlinks were in fact suggested in one of the above questions (Share code between rails projects), and were the suggestions that were put forward in two of that question's answers (including the accepted one), but I agree with the third (and highest scoring) answer, which says "-1 for symlinking. It'll do the job but not very elegantly." Another question (Is it possible to share a single file in the root folder of multiple git repositories?) had symlinks suggested in comments, and NuGet suggested in an (unaccepted) answer, but symlinks were not the preferred solution for the OP in that case.

  • One reason why using symlinks is nice, is because it saves storage space. We don't need the two repositories to duplicate storage requirements for each of the 1000 shared folders.
  • One major disadvantage is the added complexity. "Modeling Matters" would no longer be a "self-contained" repository on its own legs. Presently, when a user needs to add a code block (such as the output file from a calculation) that exceeds Stack Exchange's limit, I can tell them "Please push the file to a folder called "10213" in this repository because that's the number in this question's URL", and the landing page for that repository is so simple, that poeple who have basic-level knowledge of Git can complete the task immediately (although users that don't have experience with Git have struggled to catch on). If I were to combine the contents of the "QC Bugs" repository in there, then those users with no Git experience will likely change from "struggling" to "not even trying" because the landing page would have even more going on.

An example of a desirable solution

Repo 1 and Repo 2 can have some folders that are "shared", but not Git repositories themselves (as in git-submodule,git-subtree, and git-subrepo). If someone pushes a commit to Repo 1 that changes a file in one of these "shared" folders, that commit will sync with the corresponding folder in Repo 2, a bit like a symbolic link does, only that the symlink would not be "broken" for users that only have Repo 2 on their computer (and not Repo 1). The desired behavior would be like this:

# In Repo 1

git sync name_of_shared_folder Repo2 
cp /home/file.txt name_of_shared_folder
git add name_of_shared_folder/file.txt
git commit -a
git push

This requires permission to push to both Repo 1 and Repo 2, and the usual requirements (e.g. no conflicts). After pushing, Repo 2 now has file.txt in name_of_shared_folder. Someone who has Repo 2 on their computer but not Repo 1, will get file.txt in that folder when they run git pull, which is something that would not work with a symlink!

The question

Is this "desirable" solution possible? If not, is there a solution that doesn't require git-submodule,git-subtree, or git-subrepo, and doesn't have the above-mentioned drawback of using symbolic links?

Nike
  • 1,223
  • 2
  • 19
  • 42
  • did you check [worktree](https://git-scm.com/docs/git-worktree)? Don't know if that would fit for your criteria, but this could work. – Nordine Lotfi Jan 21 '23 at 09:28
  • This sounds like user interface design more than specifically a Git problem. Git deals with files, and a file can't be in multiple repositories. Either use a monorepo, or implement a mechanism for linking between repositories somehow. For example, hosting the user-visible artifacts on a web site would let you comfortably create a referencing mechanism to implement the linking with regular, universally familiar hyperlinks. Behind the scenes, this could be a simple HTTP redirect (whether implemented as a concrete `.htaccess` file or some other mechanism which is suitable for your use case). – tripleee Jan 21 '23 at 09:38
  • (I'm not saying go create a web site; go create a convention for how to link these things together and the rest should be obvious to pretty much anyone who understands Git. If you are targeting users who don't, maybe some of those who do will volunteer to help build a site.) – tripleee Jan 21 '23 at 09:40
  • Nope, git does not work this way. – Sam Varshavchik Jan 21 '23 at 10:06
  • Do you need to share files or to track changes to them ? – Ôrel Jan 22 '23 at 16:26
  • @Ôrel Ideally the shared folder could otherwise behave like all the other folders (i.e. changes can be tracked). As for the comments by Nordine and tripleee, I've been talking to them in the [Git chat room](https://chat.stackoverflow.com/transcript/message/55866964#55866964). – Nike Jan 22 '23 at 17:21

0 Answers0