Suppose I am responsible for maintaining a git repository with a layout like so:
--r1--------r2---r3----mx-----r4-------- [release]
\ / \
\---x1--*------x2 [topic_x] \
\ \ \
\-----------y1 [topic_y] \
\ \ \ \
`---z1------------*---------mz2----z3---- [topic_z]
\ \ \ \
`---mt1--mt2---------------mt3--- [test]
The important features of this workflow are as follows:
- There are a bunch of topic branches, and they get merged into the
test
branch when they are ready to be tested. - If everything works OK for the functionality related to a topic branch, then that topic branch is merged into
release
and becomes available in production systems. release
is occasionally merged into topic branches as a simple way to bring them up-to-date.test
, being just a combination of topic branches, is fairly expendable. It can be easily recreated by merging a finite amount of topic branches.test
is never merged into anything else.
In the above example graph,
topic_x
was successfully released with themx
merge.topic_y
was abandoned.topic_z
is a work-in-progress.
This workflow works pretty well for us. However, reality likes to introduce mistakes.
Mistakes like this one:
----r5----r6----------M--W---a4-- [release]
\ /
`-a1---a2---ma3 [topic_a]
/
--mtN-- [test]
Here's what happened:
- A developer accidentally merged the test branch into the
topic_a
branch. - The developer released
topic_a
, and it created theM
merge commit. - When they looked at the stats at some step in the release process, they saw a long list of changes from the
test
branch. They immediately realized that they had done something quite wrong! - In a panic, this developer performed a
git revert M
and pushed the result. This created theW
commit. - They collected the intended changes from the
topic_a
branch (commitsa1
anda2
), and created a patch (a4
) that was directly applied to the release branch.
The damage done wasn't immediately apparent. Later we all realized that reverting merge commits is usually a very unwise thing to do; it is better to git reset M^
. But it's too late for this repository, and some time passes...
Remember that topic_z
? git sure does, just not the way we want it to:
----------M--W---a4--- < 1 yr of commits > ---*---rHEAD [release]
/ \
------ma3 [topic_a] \
/ \
--------------------------------------------------mz [topic_z]
/
mtN-- [test]
release
is merged into topic_z
(to bring it up-to-date), and commit 'mz' is created. But something bad happens: all of the changes specific to topic_z
are deleted by the 'mz' commit! git does this because the M
commit already applied topic_z
's changes, and the W
commit removed them. So git thinks that the most up-to-date form of topic_z
's changes is one in which they are all removed!
At this point, I really want topic_z
back. And not just topic_z
, but all other topics that might have been involved in the M
and W
commits. And I don't want to have to create patches for them manually and reapply them: these topics can be "non-trivial".
How do I update pre-existing topic branches while preserving original changes, such that I can work on them and merge them back into release at a later date?
To clarify: I want to be really sure that work won't spontaneously dissappear in the future as a result of merging.
Also, history-changing is acceptable, if necessary. The team is prepared to push or delete all of their current local branches and then re-clone all repository instances after any history rewrites are done. The catch is this: any history rewriting must preserve the entire repository history, not just a single branch. The intent should be to make it as if M
and W
had never happened.
.
Here is what I've tried so far:
Attempt #1:
git checkout release
git reset --hard M^
git checkout -b new_release
git cherry-pick a4^..rHEAD
for ((i=1;i<=100;i++)); do git cherry-pick -m 1; git checkout --ours . && git add . && git commit --allow-empty --no-edit; git cherry-pick --continue; done
This produces a 'new_release' branch with source code contents nearly identical to those on release
. The results of git diff release new_release
fit on a single screen. Nice! However, all of the tree topology is lost, and all of the topic branches are now referencing the wrong commits in the release branch. This approach might provide some useful knowledge, but has too many deal-breakers to be usable.
Attempt #2:
git checkout release
git rebase -p -i M^
# Remove the M and W commits from the list.
# Merge conflicts will be encountered.
for ((i=1;i<=100;i++)); do git rebase --continue; git checkout --theirs . && git add . && git commit --allow-empty --no-edit; done
# OR:
for ((i=1;i<=100;i++)); do git rebase --continue; git checkout --ours . && git add . && git commit --allow-empty --no-edit; done
The intent here is to use a rebase to remove only the M
and W
commits. For some reason, this results in a bunch of merge conflicts, and they have to be resolved. Using either the --theirs or the --ours strategies, the end result is a release
branch that closely resembles the original in tree topology, but has a much large diff than the cherry-pick approach. It also still lacks the ability to reconstruct all of the relationships between the release
and topic branches. Once again, this approach has too many flaws to be usable in its current form.
Note that the merge conflicts were not caused by the mixture of the -p and -i flags on git rebase. Imagine the M
and W
commits are at the top of the list in the rebase edit list: I can delete them, and rebase will have no choice but to parent everything to the correct commit. While this is not exactly true, because there were a couple other commits at the top of the list, those weren't important either and I deleted them. This rebase unambiguously parents things to the correct commit (M^
).
Also note that I tried the -s and -X options with rebase before resorting to the nasty for-loop in bash. They didn't seem to have any effect at all, and allowed plenty of merge conflicts to happen, even with the --ours and --theirs strategies.