3

Submodules can be added into a repo as independent repos, so that each submodule has a .git-directory of its own:

[submodule-test] $ mkdir super-repo
[submodule-test] $ cd super-repo/
[super-repo] $ mkdir sub-repo
[super-repo] $ cd sub-repo/
[sub-repo] $ git init
Initialized empty Git repository in /home/foobar/tmp/submodule-test/super-repo/sub-repo/.git/
[sub-repo] $ touch foo.txt
[sub-repo] $ git add foo.txt
[sub-repo] $ git commit -m "* initial commit"
[main (root-commit) 60c3e14] * initial commit
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 foo.txt
[sub-repo] $ cd ..
[super-repo] $ git init
Initialized empty Git repository in /home/foobar/tmp/submodule-test/super-repo/.git/
[super-repo] $ git submodule add ./sub-repo sub-repo
Adding existing repo at 'sub-repo' to the index
[super-repo] $ git commit -m "* initial commit"
[main (root-commit) 826f205] * initial commit
 2 files changed, 4 insertions(+)
 create mode 100644 .gitmodules
 create mode 160000 sub-repo
[super-repo] $ file sub-repo/.git
sub-repo/.git: directory

However, if I now clone the super repo, submodules will no longer be independent repos, because their .git will be a file with a link to the super repo. Below the flags for git clone are selected to

  • allow cloning within a file system
  • reflect my real-world situation, where I have lots of remote submodules within remote submodules. (It is not feasible to go through these one-by-one, so I need something automated. The submodules also have differently named default branches.)
[super-repo] $ cd ..
[submodule-test] $ mkdir clone-of-super
[submodule-test] $ cd clone-of-super/
[clone-of-super] $ git -c protocol.file.allow=always clone --recurse-submodules --remote-submodules ../super-repo
Cloning into 'super-repo'...
done.
Submodule 'sub-repo' (/home/foobar/tmp/submodule-test/clone-of-super/../super-repo/sub-repo) registered for path 'sub-repo'
Cloning into '/home/foobar/tmp/submodule-test/clone-of-super/super-repo/sub-repo'...
done.
Submodule path 'sub-repo': checked out '60c3e1457a183d14a6af4ba9472e99d99e12de94'
[clone-of-super] $ cd super-repo/
[super-repo] $ file sub-repo/.git
sub-repo/.git: ASCII text
[super-repo] $ cat sub-repo/.git
gitdir: ../.git/modules/sub-repo

So, the question is: How can a repo be cloned so that submodules are cloned as independent repos, with .git-directories of their own?

jhu
  • 432
  • 2
  • 10
  • 3
    Why would you like them to be "independent repos"? – ElpieKay Aug 14 '23 at 01:56
  • Because that is the way they were set up initially, and I would like all clones to be identical. For simplicity let us say there is one master repo with lots of submodules. Some of these modules are public, some private repos. Their remotes are on different servers. The master repo is stored on a remote, with multiple clones. On the system where the master repo was originally created, the submodules were added as "independent repos", as in the example above. *On each clone, the modules are manipulated in different ways, and I would benefit from having the clones being identical.* – jhu Aug 17 '23 at 14:31
  • Still not sure what's troubling you. The clones of the main repo are identical if you use the same command and options and check out the commits that record the same submodule commits. Whether `sub-repo/.git` is a file or a directory does not make difference to most git commands. – ElpieKay Aug 18 '23 at 01:47
  • True, but it *does* make a difference for other programs that may rely on similar structure. For example, I had a script that found all `.git`-directories for certain kind of processing. In addition, in this case the real "meat" is in the submodules, and the supermodules are more in the role of collection points for submodules. I would feel all nice and comfy if the stuff that matters would be in the form of true (independent) git repos. – jhu Aug 19 '23 at 13:23
  • That's what I suspected, but you didn't mention the true trouble in the question. To get the path of `.git` in a script, it's better to use `git rev-parse --absolute-git-dir`. Git handles the different forms of `.git` and returns the real path. For example, `git -C /home/foobar/tmp/submodule-test/super-repo/sub-repo rev-parse --absolute-git-dir` returns the absolute path that `sub-repo/.git` points to no matter it's a directory or a file. – ElpieKay Aug 20 '23 at 05:50

1 Answers1

4

It does make a difference for other programs that may rely on similar structure. For example, I had a script that found all .git-directories for certain kind of processing.

That is why I mentioned before in "Is there a way to get the git root directory in one command?" the commands git rev-parse --git-dir or, for submodules, git rev-parse --show-superproject-working-tree.


But if you insist on the automated conversion part, you can try a script like:

for submodule_path in $(git config --file .gitmodules --get-regexp path | awk '{ print $2 }'); do
    # Navigate to the submodule directory
    pushd $submodule_path

    # Check if .git is a file (if not, it might already be a directory)
    if [[ -f .git ]]; then
        # Convert the .git file to a .git directory
        git_dir=$(cat .git | cut -d ' ' -f 2)
        rm .git
        cp -R ../$git_dir .git
    fi

    # Go back to the super repository
    popd
done

You would execute it in the root folder of the parent repository.


The OP adds:

I wonder what guarantees that the super repo will be left in a consistent state after this.

In particular, what is the possible effect of doing the cp -R from .git/modules of the super repo into the .git-directory of the subrepo?
How does the super repo then know which is correct contents?
Should the .git-directory be mv'd instead of cp'd? Could it be mv'd?

To me, these questions illustrate the risks of going bash instead of git itself. With the bash way, I may break the super repo.

True: the operation could potentially have side effects that might cause inconsistencies in the parent repository's state. The concern mainly comes from manipulating the .git directory directly, which is generally not recommended unless you are certain of what you are doing. Git relies on the .git directory and .gitmodules file for submodule information, so altering them could cause issues.

  • Regarding the effect of cp -R: That action duplicates the submodule data, so you will end up with two copies of the same Git objects (commits, trees, blobs). That should not be problematic as long as you are only reading and duplicating data, but it does consume extra disk space.

  • "How does the super repo know which is the correct content?": The super repository will still point to the submodule commit hashes, as specified in its .gitmodules and index. The duplication should not affect the super repository's ability to manage the submodules.

  • "Should it be mv instead of cp? Could it be mv?": Moving the .git directory with mv instead of cp would disconnect the submodule from the parent repository entirely. That means that the submodule would no longer be a submodule but a standalone repository. You would have to add it back as a submodule if you want to maintain the submodule relationship.


A safer method would be to clone the submodules manually as independent repositories and replace the existing submodules with these new clones:

# Clone the super repo
git clone <super_repo_url> super-repo

# Go into the super repo directory
cd super-repo

# Initialize submodules without updating them
git submodule init

# For each submodule, manually clone it into a temporary directory and then move it back
for submodule_path in $(git config --file .gitmodules --get-regexp path | awk '{ print $2 }'); do
    # Extract URL for the submodule from super-repo's config
    submodule_url=$(git config --file .gitmodules --get submodule.$submodule_path.url)

    # Delete the existing submodule directory and clone anew
    rm -rf $submodule_path
    git clone $submodule_url $submodule_path
done

# Stage and commit the new submodules
git add .
git commit -m "Replace submodules with independent repos"

That method uses Git's built-in functionality to clone the submodules as independent repositories, reducing the risk of corrupting the super repository's state.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • So there is no porcelain or plumbing way to do this, and you have to resort to bash. Shame. Now, I wonder what guarantees that the super repo will be left in a consistent state after this. In particular, what is the possible effect of doing the `cp -R`from `.git/modules` of the super repo into the `.git`-directory of the subrepo? How does the super repo then know which is correct contents? Should the `.git`-directory be `mv`'d instead of `cp`'d? Could it be `mv`'d? To me, these questions illustrate the risks of going bash instead of git itself. With the bash way, I may break the super repo. – jhu Sep 02 '23 at 08:21
  • @jhu I have edited the answer to address your comment, and to suggest an alternative approach going more with Git itself. – VonC Sep 02 '23 at 10:08