2

I have the great job to clean up a fu**ed up git repository. In the past someone merged the whole linux kernel src into the repository (with all the 650k commits). I know the commit id from the merge and also from the parent. Of course there were changes in the time between merging linux with the masterbranch, so at the moment the tree looks similar to this

-x-x-x-x-LinuxMerge-x-x-x-x-x-x-x-x-x-today

What I want is to revert the LinuxMerge commit incl. the history of this. Is this possibly and how?

alabamajack
  • 642
  • 8
  • 23

3 Answers3

2

I think some confusion is raised with this question, because you phrase it as wanting to "revert" the mistake - and "revert" means something specific to git. I know what you mean isn't what git means by that word, because with git's meaning a phrase like "revert history" isn't a thing.

Because you want to undo the change entirely, a history rewrite is the first step. AnoE's answer shows one way to do this, assuming that there is just one ref from which the bad merge is reachable, and that there are no merge commits "between" that ref and the bad merge.

In the event there are multiple refs, you'd need to do something more. For example if you had

x -- x -- x --- ML -- x -- A -- x <--(master)
               /            \
(linux history)              o <--(branch_one)

completing the rebase would give you

            x' -- A' -- x' <--(master)
           /
x -- x -- x --- ML -- x -- A -- o <--(branch_one)
               /
(linux history)

You'd then need to transplant the o commit, with something like

git rebase --onto A' A branch_one

(replacing A and A' with either the commit ID or some other expression that names the appropriate commit).

If there are merges that need to be rewritten, then you have a bigger issue. The rebase command will try to write a linear history by default. You can tell it you want to keep the merge topology with the --preserve-merges option, but it may not work properly. If a merge commit had conflicts, you'll have to re-resolve it. Worse, if a merge commit doesn't have conflicts, but was not originally completed using the default merge result, then rebase will not recreate the merge (or any children of it) correctly.

So the only safe way to rebase, then, is in segments, manually reproducing merges as you encounter them.

Another option might be to use git filter-branch instead of rebase; but this is tricky, too. It's only workable if you can script the removal of anything the merge introduced. For example, if the linux history is in different paths than your own work, so that you could clean up a given instance of the content by rming certain paths, then you could use filter-branch.

(Since this is an option that may or may not be viable for you, for now I won't spell out the detailed steps. The filter-branch documentation can fill in the blanks. Basically you'd use a parent-filter to bypass the merge commit (by re-parenting the following commit onto the first parent commit), plus an index-filter or tree-filter to remove the linux files from the subsequent commits.)

One way or another, once you have the history cleaned up you would still have all that history in your repo's database. At a minimum you need to make sure nothing references that history. Then it would eventually get cleared out by gc (or you could force that to happen sooner).

Mostly that means you have to find any refs that can reach the linux history. Since the rewrite moved "your" refs, this would likely comprise any refs (branches or tags) pulled in with the linux history itself. So you'd just want to delete those.

There also will be reflogs that can (indirectly) reach the linux history, and gc can't remove history that's reachable in this way. Honestly at this point the easiest thing to do is probably to re-clone the repo (as a new clone should only get the current refs and their history) and replace origin with the result.

If you want to repair an existing repo instead of re-cloning for whatever reason, the next step would be to wipe out reflogs (I usually just rm -r .git/logs) and then run an aggressive gc (see the gc docs)

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
1

You can undo this by rebasing.

If you start out from this...

-x-x-x-x-LinuxMerge-x-x-x-x-x-x-x-x-x-today

... then you probably are talking about that, instead:

-x-x-x-x-x-x-x-x-x-x-x-x-x-today
        /
-linus-/

Let's label some more commits:

-x-x-x-prev-merg-post-x-x-x-x-x-x-x-today
             /
     -linus-/

So, you want to glue prev and post together, and throw away merg. The command for this is:

git rebase merg today --onto prev

(Note that in the command, we mention merg, not post; this is the typical "+-1" issue with declaring commit ranges in git).

This rebase command will add a new line of commits and change the today branch to point at the new tail:

          post'-y-y-y-y-y-y-y-today'
         /
-x-x-x-prev-merg-post-x-x-x-x-x-x-x-today
             /
     -linus-/

And if you just ignore the older stuff, this flattens out to:

-x-x-x-prev-post'-y-y-y-y-y-y-y-today'

The rebase will also change the today branch to point at the commit labeled today' in this ASCII picture.

Note that post' and the y commits (as well as today') will all have different hashes than the originals, they are not the "same" commits.

If no other tags or branches point to the history leading up to linus, then those commits and related objects will be purged eventually by the git garbage collection (which you could force with git gc to make sure).

AnoE
  • 8,048
  • 1
  • 21
  • 36
  • This still doesn't clean up the database, though - which is a bit ironic since you called Enrico's answer out for the same thing. At best this procedure *might* make the history eventually become eligible for gc, assuming no refs were brought in with the linux history, and that origin runs in a context where `gc` happens from time to time – Mark Adelsberger Dec 14 '17 at 13:48
  • Thanks for mentioning that, @MarkAdelsberger, I'll update the answer. – AnoE Dec 14 '17 at 13:51
  • Thanks for your answer. I see, its hard to solve such an issue... But thanks for this informations! – alabamajack Dec 15 '17 at 09:41
  • @alabamajack: no, it's not that hard, really - the explanation just was a bit long. :) In practice, it's just a single "rebase" command (provided everything goes well...). – AnoE Dec 15 '17 at 09:47
0

You have a couple of options here.

Option 1: Rewriting history

If you are able to rewrite the history of the master branch without consequences, the quickest way to achieve what you want is to simply remove the merge commit altogether with git rebase --onto:

git checkout master
git rebase --onto <SHA-1-of-the-linux-merge>^ <SHA-1-of-the-linux-merge>

This means: "rebase master on top of the first parent of the merge commit starting from the merge commit itself". This will effectively remove the merge commit and apply all subsequent commits on top of its first parent. You can read more about how git rebase --onto works here.

Option 2: Reverting the merge

If you want to avoid rewriting history, you can always revert the "LinuxMerge" commit by using git-revert:

git revert --mainline-parent 1 --no-commit <SHA-1-of-the-linux-merge>

The --mainline-parent option tells Git which parent of the merge commit you want to revert to. In this case, you want to revert to the first parent, that is the commit where the Linux kernel was merged to.

From the documentation:

Usually you cannot revert a merge because you do not know which side of the merge should be considered the mainline. This option specifies the parent number (starting from 1) of the mainline and allows revert to reverse the change relative to the specified parent.

Note that reverting a merge this way will cause later merges of the same branch to exclude the commits that were originally brought in by the reverted merge:

Reverting a merge commit declares that you will never want the tree changes brought in by the merge. As a result, later merges will only bring in tree changes introduced by commits that are not ancestors of the previously reverted merge. This may or may not be what you want.

However, in this case it sounds like you won't be merging the Linux kernel again anytime soon.

As for the --no-commit option, it lets you do a dry run to see whether you get any conflicts in your working directory without actually creating the commit.

Enrico Campidoglio
  • 56,676
  • 12
  • 126
  • 154
  • I know this, but I thought I am false. I did it and get `error [...] hint: after resolving mark corrected paths [...]` git status shows : `You are currently reverting commit 1234567 . (fix conflicts and run "git revert --continue") [...] Unmerged paths: (use "git reset HEAD ..." to unstage) (use "git add ..." to mark resolution) added by us: file.sh` after `git add file.sh` I get `git status: (all conflicts fixed: run "git revert --continue")` but doing this always says `You are currently reverting commit 1234567. nothing to commit, working directory clean` – alabamajack Dec 14 '17 at 12:34
  • Did you include the `-m 1` option? The error message you got tells you that some changes made _before_ the merge conflict with changes made _afterwards_. At this point, you need to [resolve the conflicts](https://help.github.com/articles/resolving-a-merge-conflict-using-the-command-line/) in order to proceed with the revert. – Enrico Campidoglio Dec 14 '17 at 12:40
  • If you run `git revert` with the `--no-commit` option, you can simply commit the changes with `git commit` after you resolved the conflicts. – Enrico Campidoglio Dec 14 '17 at 12:43
  • 1
    This does not solve the problem. Yes, `git revert` will undo the change, but simply by adding a new commit that "inverts" all file changes. They still will have 650K commits sitting in the history, with hundreds of megabytes of history. – AnoE Dec 14 '17 at 12:47
  • Reverting a commit will cause so many issues. I personally would avoid it. – evolutionxbox Dec 14 '17 at 12:47
  • Yes I added `-m 1`. The curious thing is, that `git status` says, that all conflicts are fixed. If I run `git revert --continue` afterwards, I get `You are currently reverting commit 1234567. nothing to commit, working directory clean`. I don't get the problem. – alabamajack Dec 14 '17 at 12:48
  • @https://stackoverflow.com/users/5227053/ano , is there a way to solve what I want? – alabamajack Dec 14 '17 at 12:48
  • @alabamajack I just noticed that you said "_including the history_". I updated my answer with a better approach. – Enrico Campidoglio Dec 14 '17 at 12:55