Git is built to, and hence "wants to" in some sense, add new commits but never remove any old ones. It is possible to remove commits, but:
- Be very sure you really want to do this!
- Be aware that commits are like certain diseases: if you've had commit X in your repository, and
exchanged fluids had interactions with another Git repository that's a clone of the same source, or a clone of yours or you're a clone of theirs, they probably have commit X too now. The next time you connect your Git to their Git, you're likely to get commit X back again. To make some commit really go away, you must cure remove the problem from all affected / infected Git repositories. Since, in general, you only control your own Git repository, that means you must get everyone else to fix their repositories too.
With that out of the way, here's how you do it, using git cherry-pick
and git reset
. There is more than one way to do it but let's go with these two commands here.
Git's thing is the commits; the names are secondary
As you've already seen, every commit has a unique hash ID—some big ugly string such as b5101f929789889c2e536d915698f58d5c5c6b7a
. These IDs are the same across every Git that shares this repository. (The one I've listed here is a commit in the Git repository for Git itself.)
Each commit retains, for as long as the commit itself exists, a full snapshot of all the files. Well, it has all the files that are in the snapshot, but that's like saying that all blue crayons are blue: it's kind of silly. The point is that it's a snapshot of the files. It doesn't say "change README
this way", which would require going back and finding how README
looked before. It just says we have README
and it looks like this. If the snapshot doesn't have a file, Git should perhaps remove the file (though this part gets a little trickier because Git allows you to have "untracked files"). In any case the files in the snapshot are frozen forever, or at least, for as long as the commit exists.
But each snapshot also has some metadata, such as your name (if you made the commit), when you made it, why you made it—your log message—and, crucially for our purposes, the hash ID of the previous commit. That metadata, like the files, is frozen forever, or for as long as the commit exists. Note that when you have Git show you a commit, Git shows (some of) the metadata, and then shows you the difference between this commit's files and this commit's parent's files. It can do that because of the parent, or previous, commit's hash ID, saved as part of this commit.
What this means for us is that we can draw out strings of backwards-pointing commits, with each commit naming its parent:
A <-B <-C
If the hash IDs were simple uppercase letters like this, we could just scan them all and find the last one, but they're not: they seem random (though actually they're strictly determined by all the bits saved inside the commit, which is why we can't change any of the bits inside the commit!). So Git needs a way to save the hash ID of the last commit, from which it can work backwards.
That last commit in the branch hash ID is the function of the branch names, like master
:
A--B--C--D--E <-- master
We—and Git—start at the end, by using the name master
to get the hash ID (here E
). Then we work backwards, following those unchange-able internal arrows.
The branch name arrows—the hash IDs stored under the names—can change, as we'll see.
Adding commits to a branch
To add a new commit to the current branch, we have Git save a snapshot of the files, add our name and email and our log message, and save the hash ID of the current commit. Git writes all of that into the new commit, which thereby acquires a new hash ID:
A--B--C--D--E <-- master
\
F
Now Git just updates the name to record the new latest commit:
A--B--C--D--E
\
F <-- master
which we can then straighten out:
A--B--C--D--E--F <-- master
Note that it's the commits, and their relationships to each other—the internal, backwards-pointing arrows—that are crucial here. The names do matter, but only because that's how we find the commits. The commits themselves form a Directed Acyclic Graph or DAG. The names let us get into the DAG. Nothing in the DAG itself can ever change, but the names can move, and we can add new commits.
(We're free to draw the DAG however we want, bending the connecting arrows, as long as they still connect. I use lines rather than arrows in the text here because it's hard to find good text characters to do diagonal arrows.)
Adding more branches to the graph
Suppose we have our six commits now:
A--B--C--D--E--F <-- master
and want to make a new branch. We use either git branch
or git checkout
to make the branch, so now we have:
A--B--C--D--E--F <-- BranchA, master
The two names both point to the same commit, F
. All six commits are now on both branches.
If we add a new commit, obviously we'll get:
A--B--C--D--E--F
\
G
the same way we got F
earlier. But which name should change? To answer that question, Git attaches the name HEAD
to one of the branches:
A--B--C--D--E--F <-- BranchA (HEAD), master
This tells Git which name to change:
A--B--C--D--E--F <-- master
\
G <-- BranchA (HEAD)
The HEAD
attachment remains when the name moves. We need to know about the attachment when we want to know: Which branch are we on? Which branch will our command affect if it affect the current branch? If we're just looking at what's in the repository, we can leave it off.
So, with that out of the way, let's draw your existing graph more completely
Your have a series of commits ending in the one you're calling A3
above, after which things get a little hairier. I like one letter names but I'll use yours here:
...--A3
Now, you say your master
reaches B2
which is preceded by B1
which is preceded by A3
, so there must be two more commits after:
...--A3--B1--B2 <-- master
Meanwhile your Branch_B
starts out at B2
, which is preceded by C3
, but that's literally impossible:
...--A3--B1--B2 <-- master
\
C3--B2 <-- Branch_B
so you must have made some mistake in transcribing your commit hashes (not surprising since they're big and ugly and basically require careful cut-and-paste to avoid errors). I'm going to assume that the B2
on master is really some other ID, and replace it here with B2a
:
...--A3--B1--B2a <-- master
\
C3--B2 <-- Branch_B
Your Branch_C
starts—well, ends?—with C2
, which is preceded by C1
, then B1
, then A3
:
C1--C2 <-- Branch_C
/
...--A3--B1--B2a <-- master
\
C3--B2 <-- Branch_B
You can confirm this by using git log --decorate --oneline --graph --decorate master Branch_B Branch_C
(or git log --all --decorate --oneline --graph
, Get Help From A Dog). That draws vertically-oriented graphs, which aren't as pretty or obvious to me, but are still very useful.
How to get what you want: it requires changing what you want, slightly
Now, here's what you say you would like:
C1--C2--C3 <-- Branch_C
/
...--A3 <-- master
\
B1--B2 <-- Branch_B
You can't get this. We already said that there is no power anywhere to change anything in any existing commit, and looking at what we have now, the parent of commit B2
is commit C3
, for instance.
But you can get something that's probably just as good, which is: you can make a copy of B2
. In fact, you probably already have—B2a
and B2
are likely copies of each other.
Without worrying about the exact copying mechanism yet, let's see what happens if we make a B2b
that's a copy of B2
but that has B1
as its parent:
C1--C2 <-- Branch_C
/
...--A3--B1--B2a <-- master
| \
| C3--B2 <-- Branch_B
\
B2b <-- new-branch-b
Next, let's copy C1
to a new C1a
that springs from A3
:
C1a <-- new-branch-C
/
/ C1--C2 <-- Branch_C
/ /
...--A3--B1--B2a <-- master
| \
| C3--B2 <-- Branch_B
\
B2b <-- new-branch-b
Then we just need to copy C2
and C3
, one by one:
C1a--C2a--C3a <-- new-branch-C
/
/ C1--C2 <-- Branch_C
/ /
...--A3--B1--B2a <-- master
| \
| C3--B2 <-- Branch_B
\
B2b <-- new-branch-b
Almost-last, we need to move the old names, Branch_B
and Branch_C
, so that the point to commits B2b
and C3a
respectively:
C1a--C2a--C3a <-- new-branch-C, Branch_C
/
/ C1--C2 [abandoned]
/ /
...--A3--B1--B2a <-- master
| \
| C3--B2 [abandoned]
\
B2b <-- new-branch-b, Branch_B
Then we need to move the name master
back two steps so that it points to A3
instead of B2a
, abandoning B2a
entirely. That's hard to draw until we stop drawing the abandoned commits. They will still be in your repository for a while (at least 30 days by default), but hidden away so that you can't see them any more, which gives us:
C1a--C2a--C3a <-- new-branch-C, Branch_C
/
/__________
/ \
...--A3--B1 -- master
|
|
\
B2b <-- new-branch-b, Branch_B
We can now drop the new-branch-[bc]
names and clean up the arrangement of the drawing:
C1a--C2a--C3a <-- Branch_C
/
...--A3 <-- master
\
B1--B2b <-- Branch_B
Except for the suffixes here, which mean these are different hash IDs, this is just what you wanted!
Getting from here to there: adding new names
First, you just need to add the new names, pointing to the desired commits:
git branch new-branch-b <hash of B1>
git branch new-branch-c <hash of A3>
The hash IDs we choose here are the commits that will continue to be on the newly-built branches. For Branch_B
, that's B1
, which we can leave in place, but for Branch_C
, that's commit A3
, because we have to copy C1
to C1a
.
Getting from here to there: copying commits
Now its time to copy the commits. Let's copy B2
or B2a
. You can use whichever you like, as long as they make the same changes and have the same commit messages, because the copying command is git cherry-pick
and the way it works is very similar to what we said earlier about showing a commit:
[Git] shows you the difference between this commit's files and this commit's parent's files
Instead of showing the difference, git cherry-pick
finds the difference, then applies that to whatever commit we've checked out, makes the same changes, and commits the result, using the same log message as the original commit too.
So we just need to:
git checkout new-branch-b
git cherry-pick <hash-of-B2a or whatever>
which gets us this far, when we draw the graph and leave out a lot:
...--A3
\
B1--B2b <-- new-branch-b
Then we need to build up new branch C the same way:
git checkout new-branch-b
git cherry-pick <hash-of-C1>
git cherry-pick <hash-of-C2>
git cherry-pick <hash-of-C3>
The result, again leaving out lots of graph-drawing, is the desired:
C1a--C2a--C3a <-- Branch_C
/
...--A3
The last step is to make master
identify commit A3
, and for that we just need to git checkout master
and then git reset --hard
:
git checkout master
git reset --hard <hash-of-A3>
(Note: if you're doing this with hash IDs, it's a good idea to cut and paste them, and/or save them in files, as it's far too easy to get typos here. There are tricks to use relative names but I'm not going to include them in this answer.)
The git reset
command affects whichever branch name HEAD
is attached to, and the git cherry-pick
command makes new commits on whichever branch name HEAD
is attached to. That's why we had to git checkout
each of those names.
At this point, we have the new branch names, and master
points to A3
, but we have not updated the two other branch names. As before, we can use git checkout
and git reset --hard
here:
git checkout BranchB
git reset --hard new-branch-b
git checkout BranchC
git reset --hard new-branch-c
We don't need hash IDs this time, because for commands like git cherry-pick
and git reset
, the name of a branch means the commit whose ID is stored in that branch name.
Once we've finished all of this we can just delete the names new-branch-b
and new-branch-c
:
git branch -D new-branch-b
git branch -D new-branch-c
The -D
is the forcible delete, which makes Git do it even if Git thinks it's not safe. (Git's idea of when this is safe and when this isn't is, um, a good try, but not great.)
Cherry-pick can have merge conflicts
This isn't particularly likely for your case, but it's important to know for the future. Every git cherry-pick
is actually a kind of merge. Git is going to "merge" the changes made in the commit itself—computed by comparing the parent commit to the commit, just like git show
compares the two—into the current commit, finding your current commit's changes by comparing the parent commit of the cherry-picked commit to the current (HEAD
) commit.
If you are a bit confused here, don't worry: The preceding paragraph is definitely hard to read. It's really best shown by illustration:
o--o--...--P--C--o--...--o <-- other-branch
/
...--o
\
o--o--H <-- your-branch (HEAD)
You run git cherry-pick <hash of C>
. Git:
- Diffs
P
vs C
: that's what they changed.
- Diffs
P
vs H
: that's what you changed, sort of
- Combines these two sets of changes, applying the combined changes to the files from
P
(i.e., repeating "what you changed" just to get back to what's in H
, but then adding "what they changed" to get from H
to the result).
- If the combining works, makes a new commit
C'
. Otherwise, stops and leaves a mess.
When this works without effort on your part, the effect is that whatever changed from P
to C
, those same changes are now in the new commit C'
that git cherry-pick
made that's a copy of commit C
:
o--o--...--P--C--o--...--o <-- other-branch
/
...--o
\
o--o--H--C' <-- your-branch (HEAD)
When it goes wrong, Git stops with a merge conflict, the same way it stops in git merge
when something goes wrong. At that point it's your job to complete the "merge"—the cherry-pick, in this case—and then run git commit
or git cherry-pick --continue
to finish the job. You can use all the same tools that you would during git merge
, to finish the job, so whatever you like for git merge
, use the same method.