6

As a subset of the question detach-subdirectory already made before and considering the fact that even though a lot of questions were made about the process of splitting and merging git repositories, I couldn't find one that touches the subject of splitting when submodules are present.

So in the following scenario:

.git/
.gitmodules
folder/
    data/
    content/
        other_data/
        submoduleA/
        submoduleB/

I would like to get two repositories with the following structure:

.git/
data/

and

.git/
.gitmodules
content/
    other_data/
    submoduleA/
    submoduleB/

The first case is not a problem and can be solved easily with the method described in detach-subdirectory.

The second not so much. The existence of submodules and the fact that .gitmodules contains the full path for folder/content/submoduleA and folder/content/submoduleB causes part of the history to be inconsistent since .gitmodules refers to a nonexistent directory structure (once filter-branch is used).

So I would like to know if there is a way to do this without causing inconsistent history.

Community
  • 1
  • 1
unode
  • 9,321
  • 4
  • 33
  • 44

3 Answers3

6

I had the exact same problem as Unode and managed to resolve it with the following procedure:

git clone git@github.com:kdeldycke/kev-code.git
cd kev-code
git filter-branch --tree-filter "test -f ./.gitmodules && mv ./.gitmodules ./cool-cavemen/gitmodules || echo 'No .gitmodules file found'" -- --all
git filter-branch --force --prune-empty --subdirectory-filter cool-cavemen --tag-name-filter cat -- --all init..HEAD
git filter-branch --force --tree-filter "test -f ./gitmodules && mv ./gitmodules ./.gitmodules || echo 'No gitmodules file found'" -- --all
git filter-branch --force --tree-filter "test -f ./.gitmodules && sed -i 's/cool-cavemen\///g' ./.gitmodules || echo 'No .gitmodules file found'" -- --all
git remote rm origin
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune
git remote add origin git@github.com:kdeldycke/cool-cavemen.git
git push -u origin master --force --tags

As you see, the trick is to temporarily rename the .gitmodules file and use sed to rewrite its content. You can get all details and the context of this procedure on my blog.

  • You may want to include `--tag-name-filter cat` option in all filter-branch commands to preserve tags after filtering. – kolyuchiy May 01 '12 at 15:52
  • Thanks for documenting this. Two things I had to tweak to get this to work. First, I think you assume an initial commit tagged as `init` for the `init..HEAD` range. Second, I had to add `-e` to the sed command, i.e.: `sed -i -e 's/cool-cavemen\///g' ./.gitmodules` – Von May 08 '13 at 01:49
2

I suspect (not tested) that a second git filter-branch would have the opportunity to modify the .gitmodules content for each commits of the new repo.

But actually a git submodule split command was in discussion early 2009.

Proposed usage:

git submodule split [--url submodule_repo_url] submodule_dir \
    [alternate_dir...]

Replace submodule_dir with a newly-created submodule, keeping all the history of submodule_dir.
This command also rewrites each commit in the current repository's history to include the correct revision of sumodule_dir and the appropriate .gitmodules entries.

However, I don't see it in the latest what's cooking.
The script in the proposed patch can give you an idea of the kind of tree rewriting necessary to update the .gitmodules file though.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Using a second git filter-branch command I was able to rewrite .gitmodules with a sed command, however the actual submodule folders are still left untouched (both with index-filter and tree-filter). Only subdirectory-filter was able to change them but then .gitmodules is removed. The git submodule split command seems to do exactly what I intend, but reading the thread I got the impression that it has a few problems so I don't feel comfortable using it. – unode Aug 05 '10 at 16:37
  • @Unode: I understand. I don't think this particular patch is in active development right now. – VonC Aug 05 '10 at 16:58
0

To elaborate on Kevin's answer: assuming that no submodules ever existed outside cool/cavemen – the folder being detached (otherwise more elaborate editing of .gitmodules will be needed to remove those extra sections), this can be achieved much faster and in one step using an index-filter:

$ git filter-branch --subdirectory-filter cool/cavemen --index-filter $'
hash=$(git rev-parse --verify $GIT_COMMIT:.gitmodules 2>/dev/null) &&
 git update-index --add --cacheinfo 100644 $(git cat-file -p $hash |
 sed \'s/cool\\/cavemen\\///g\' | git hash-object -w --stdin) .gitmodules ||
true' --tag-name-filter cat --prune-empty -- --all

As an added benefit, if cool/cavemen did not exist in every revision or branch only those revisions or branches that did contain cool/cavemen will be looked at.

If this is the case you may want to run the following to remove unchanged references:

$ git for-each-ref --format='%(refname)' | 
 grep -vF "$(git for-each-ref --format='%(refname)' refs/original |
 sed 's/refs\/original\///g')" | xargs -n 1 git update-ref -d
Levi Haskell
  • 795
  • 8
  • 20