Merge GIT Object history when files are already split

Question

we've had one big GIT Project where all data was stored. Some time ago we decided to split them up. Now we have 10 projects instead of 1. Unfortunately we missed to migrate objects with the respective history. Which means history for objects in new projects started from scratch and old history is still within the former "big" project.

I just tried to backup and restore the history and remaining files with git bundle which worked pretty well. However this is storing history as a whole. Is it somehow possible to backup/restore/merge history just for respective objects / items in my project?

Have you tried rebasing your branches on top of the old history? — Jens, Sep 19 '17 at 15:20
https://stackoverflow.com/questions/359424/detach-move-subdirectory-into-separate-git-repository — Josh Lee, Sep 19 '17 at 15:20

score 0 · Answer 1 · answered Sep 19 '17 at 19:24

What you want to do is possible; but it may be somewhat involved. In the following, I will refer to the historical repo as Repo0, and you're migrating to 10 new repositories (Repo1, Repo2, ...).

Creating new histories

Within git, a commit is more or less atomic; so you can't really say "create a bundle, but only include these paths" or similar. Instead you have to create a "new history" for each partial project. You can do this using filter-branch.

Of course it's simplest if there's just one branch to worry about, but let's suppose Repo0 might have a set of branches relevant to any given project. Let's say master and dev contain relevant history for Repo1.

So in Repo0 you start by creating new branches

git branch Repo1/master master
git branch Repo1/dev dev

Now you filter the new branches, converting the "big history" into a history relevant to Repo1 alone. If Repo1 corresponds to a subdirectory in Repo0, this is easy. So in the best case Repo0 looks like

Project1-Files/
  some.file
  test/
    test.file
Project2-files/
  another.file
  ...

and Repo1 would end up with

some.file
test/
  test.file

If things are that simple, then you just use a subdirectory-filter.

git filter-branch subdirectory-filter Project1-Files --prune-empty -- Repo1/master Repo1/dev

If more rearranging is needed, you might have to use a tree-filter.

git filter-branch tree-filter my-filter.sh --prune-empty -- Repo1/master Repo1/dev

where my-filter.sh is a script that transforms a worktree from Repo0 into the correct structure for Repo1. This is much more resource-intensive than the subdirectory-filter approach.

A compromise is to use index-filter, and if you really want to get complicated you might be able to get it to do exactly what you want faster than tree-filter. The syntax is the same as for tree-filter, but the filter script has to operate directly on the index instead of the work tree. So the "simple" compromise would be to remove all irrelevant files, but leave the relevant files where they are in the directory structure. Your history might thus have some spurious "file move"s where the histories are spliced together.

Anyway, once this runs successfully, Repo1/master and Repo1/dev will contain a new history suitable for Repo1. (master and dev will still be the "big project" history, and you'll go back to that as the starting point for building each other repo's new history.)

Next you transfer the new history to Repo1. You can do this with bundles (containing Repo1/master and Repo1/dev), or you can directly add Repo0 as a remote in Repo1 and just fetch the refs that way.

Finally you would graft the "recent history" that's already in Repo1 onto the "old history" that you migrated. There are two general approaches to this.

One way is to physically rewrite the history, which would again use filter-branch. There are a few variations on this, but basically look at how filter-branch's --parent-filter works. This creates the most seamless history going forward, but it changes the identity of each commit in Repo1; so my advice is to do a "hard cut-over" where everyone pushes their Repo1 changes and throws away their clones, you perform the conversion on the Repo1 origin, and then everyone re-clones.

If you can't coordinate hard cut-overs, or you can't afford to lose the old commit ID's for some other reason, then you could instead consider using git replace to paper over the break in history. Please see the git replace docs as there are some quirks and limitations.

Or of course you can just leave the "old history" behind a second set of refs in each new repo

Merge GIT Object history when files are already split

1 Answers1