0

I have several (~20) Git repositories that are non-overlapping in their files. I want to combine their master branches in a single (new) repository.

After some reading I came up with the following process.

  1. Create destination repository (git init) and change into it
  2. git remote add <name> <url>
  3. git fetch <name>
  4. git merge <name>/master --allow-unrelated-histories -m "Imported"
  5. git remote rm <name>
  6. Repeat with 2-5 until all repositories are merged

The first repositories merged nicely and the history was intact, but then I encounter merge conflicts.

E.g. for different files with the same name in different directories (and there has been no renaming on my side):

CONFLICT (rename/rename): Rename "Splittermond_CharGen_JFX/.project"->"BootloaderPlugin/.project" in branch "HEAD" rename "Splittermond_CharGen_JFX/.project"->"Splittermond_Zhoujiang/.project" in "splimo-common/master"

E.g. from files that I moved in the history of the project (and where version left in the tree is the recent position):

CONFLICT (rename/delete): Splittermond_BuU/src/org/prelle/rpgframework/splittermond/buu/BestienUndUngeheuerPlugin.java deleted in HEAD and renamed to Splittermond_BuU/src/main/java/org/prelle/rpgframework/splittermond/buu/BestienUndUngeheuerPlugin.java in splimo-common/master. Version splimo-common/master of Splittermond_BuU/src/main/java/org/prelle/rpgframework/splittermond/buu/BestienUndUngeheuerPlugin.java left in tree.

I assume that Gits ability to track files may be the problem, but I am fairly new to this and don't know how to work around this.

Any help or hint is appreciated.

[Update] It looks like I have at least two repositories that - although not overlapping anymore - once have been overlapping. I have a git repo A that complains it has deleted files that are now in repo B. And I have a repo B that once contained files that are now in repo A. Is there a way to merge both, keeping the history of all files that not have been deleted?

taranion
  • 631
  • 1
  • 6
  • 17

3 Answers3

2

It could be that rename detection produces false findings. Then you could try to avoid it by adding -Xno-renames to the merge command:

...
git merge <name>/master --allow-unrelated-histories -m "Imported" -Xno-renames
...

It should not harm as you are merging unrelated histories and do not expect any renames.

max630
  • 8,762
  • 3
  • 30
  • 55
  • I will try and report back tomorrow. – taranion Oct 25 '17 at 21:47
  • That was helpful, but did not solve it completely. The (rename/rename) conflicts are gone using this option. The (rename/delete) conflichts remain. Still: Thank you for hint. – taranion Oct 26 '17 at 18:28
1

It's not at all clear to me what is happening, and I would need access to the repositories in question and your commands to reproduce this. However, there are two key points to keep in mind here when considering how to make this all work out:

  • In Git, history is commits (or, more clearly, "the commits are the history"). If you want to retain history, this means you want to retain existing commits.
  • Merging (the verb form, to merge) means, in Git, to find a common base commit between two (presumably long) chains of commits, so as to compare "what we did on our branch" to "what they did on their branch" since that common point.

A normal merge has two "sides". I call them L for left, local, or --ours, and R for right, remote, or --theiRs. It also has this merge base commit, the common point that we and they started from before we started doing our own things. Git combines "what we did" with "what they did" by running:

git diff --find-renames B L   # base to left/local: what we did
git diff --find-renames B R   # base to right: what they did

Merge conflicts occur if, for instance, both "we" and "they" modified the same lines of the same files, or we added a file path/to/new.txt and they added the same path/to/new.txt but it has different content, or we removed path/to/old.txt and they modified path/to/old.txt.

When you use --allow-unrelated-histories you are telling Git that if there is no common commit—which would often be true here—Git should pretend there is a common base consisting of a commit that has no files at all. That is, for B in the two git diff commands, Git should substitute in the empty tree, so that every file is new.

Now, you said:

... [some fairly large number of] Git repositories that are non-overlapping in their files

If this is the case, then there cannot be a path/to/new.txt in both L and R. If there is a new file on both sides, the files are overlapping.

Moreover, you cannot get a rename/rename or rename/delete conflict if the histories are truly unrelated, as there will be no merge base and Git will be using an empty tree for B every time. The fact that you are getting such a conflict indicates that the histories are related, so that Git finds a common merge base, and the git diff from that common merge base is finding rename operations on one side and either a different rename, or a delete, on the other side.

Because this is the case, the obvious answer for how to merge all these unrelated histories more easily cannot be used: there are some files that are overlapping and this method won't work so easily. But if they were truly all non-overlapping, the way to merge them would be to fetch all the commits from all the repositories, then build one master "octopus merge" commit (here I use merge as an adjective or noun, not as a verb) whose tree is generated by using git read-tree -m on all the appropriate branch tips to build up a merged index, and whose commit is generated by running git write-tree and then git commit-tree (with appropriate flags).

I'm hesitant to supply the recipe for this, though, because if it were going to work, you really would need unrelated inputs, and the failures you are seeing tell me that you do not have unrelated inputs.

torek
  • 448,244
  • 59
  • 642
  • 775
  • The repos originated from a single Subversion repository, which I converted into several Git repositories, by building a single Git repository first and then moving subdirectories in different repositories. So they once for a very brief time had a shared repository, but they still should not be overlapping. Thanks for your explanation. I will explore this a little further and report back. – taranion Oct 25 '17 at 21:47
  • I think we are close to the problem here. It looks like I have at least two repositories that - although not overlapping anymore - one have been overlapping. I have a git repo A that complains it has deleted files that are now in repo B. And I have a repo B that once contained files that are now in repo A. Is there a way to merge both, keeping the history of all files that not have been deleted? – taranion Oct 26 '17 at 19:59
  • I'd really have to see the repositories (or good-enough facsimiles) to understand the issue and design the answer. Note that Git finds merge bases by graph traversal, so that what matters are commit hashes. – torek Oct 26 '17 at 20:47
  • It might help to clean the history of the repositories to import from all files that have been deleted in the meantime. Can this be done? – taranion Oct 26 '17 at 21:03
  • 1
    You can split out by directories (see `git subtree`), or remove particular files (with a tree or index filter and `filter-branch`) along with the `--prune-empty` flag to make Git omit commits that have no change from their parent. It's ... cumbersome, to say the least, and may not be the right way to do this, although much depends on what you plan to do in the future too. – torek Oct 26 '17 at 21:18
  • After some digging I came up with the following solution which I apply after each merge of a repository: `git ls-files > /tmp/keep-these.txt` `git filter-branch --force --index-filter "git rm --ignore-unmatch --cached -qr . ; cat /tmp/keep-these.txt | xargs git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all` – taranion Oct 26 '17 at 21:25
  • That's definitely workable as an index filter. (If you have spaces in file names you might want `-z`, and `xargs -0`, as needed.) Note that if there are renames in the past, though, this will drop the earlier-named versions of files (and then prune any "emptied" commits). – torek Oct 26 '17 at 21:59
0

Thanks to toreks help I came up with the following solution:

  1. Initialize new repository with git init
  2. git remote add <name> <url>
  3. git fetch <name>
  4. git merge <name>/master --allow-unrelated-histories -m "Reimported"
  5. git remote rm <name>
  6. git ls-files > /tmp/keep-these.txt
  7. git filter-branch --force --index-filter "git rm --ignore-unmatch --cached -qr . ; cat /tmp/keep-these.txt | xargs git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all
  8. Repeat steps 2-7 for each repository

The added steps 6-7 were taken from new-repo-with-copied-history-of-only-current-tracked-files

I hope that helps.

taranion
  • 631
  • 1
  • 6
  • 17