How git cherry-pick a range of commits?

Question

I saw that there is a connection between rebase and cherry-pick a range of commits.

I have failed to find any article/toturial which explain what exactly happands when one try to cherry-pick multiple commits.

Some questions (which I can think of) are:

What CHERRY_PICK_HEAD ref is?
By running git cherry-pick 2^..4, what is the sequence of actions git does and exactly between which commits git use diff?

By running git cherry-pick 1..8, what git will do?

See https://www.git-scm.com/docs/git-cherry-pick.html and you could have a try. — ElpieKay, Nov 04 '17 at 12:25
@ElpieKay I already searched for answers there with no success. You are welcome to answer my questions if the answers are obvius as you are implying. — Stav Alfi, Nov 04 '17 at 12:39
See answers at https://stackoverflow.com/q/9339429/7976758, especially http://think-like-a-git.net/sections/rebase-from-the-ground-up/cherry-picking-explained.html — phd, Nov 04 '17 at 14:17
@phd Thanks but they cherry-pick introduced in your links are too simple and doesn't cover what I'm looking for. I will love to know If you think I missed something. — Stav Alfi, Nov 04 '17 at 14:24

score 5 · Accepted Answer · edited Nov 09 '17 at 21:51

Cherry picking n commits is the same as cherry picking one of them after the other (with distinct git calls). No branching or whatever else is involved, simply new commits created for you on the current branch.

More details

The help page https://www.git-scm.com/docs/git-cherry-pick.html says:

Given one or more existing commits, apply the change each one introduces, recording a new commit for each.

Let's pick that statement apart:

the change

This is what sometimes is also called "the diff". It is what git diff HEAD somecommit would output. Sometimes it is also called a "patch" (in fact, that same output from git diff could be applied with the usual patch utility - and of course with git apply but that's not the point here).

So, the "change" is something that instructs a tool like git or patch how to modify a text file to end up with a new, changed text file.

You could create a similar textual representation of the differences between two files by running the standard diff utility on two files. In fact, that is what git does with a cherry-pick, internally (with its own diff implementation, of course); i.e. this is only a 2-way diff, not a 3-way diff like in the git merge operation.

each one introduces

When you have this state:

...----+----+----...
   abc  def

Then the git cherry-pick def change is the 2-way diff between commits abc and def (for all the files which differ, of course, on a file-by-file basis), because this is what def "introduces".

apply the change

This means to take HEAD and "the change" (i.e., the diff, the patch, etc.) and create a new set of text files. You can treat this, in principle as a 2-way merge (just like the patch utility would do) except when it isn't, i.e. if the context information in the diff output does not match what's in HEAD right now. In that case, git cheats to find a common ancestor to be able to do a 3-way merge, and you can read up the gory details in What are the three files in a 3-way merge for interactive rebasing using git and meld? . But it is still, from the user's point of view, not really comparable to a git merge insofar as structurally it will end up with a single-parent commit, not a 2-parent commit like git merge.

recording a new commit

git applies the changes to the index and to the working directory, and commits. Unless the 2-way merge and the 3-way merge did not work out without a conflict, in which case, straight from the help page with some comments of mine:

The current branch and HEAD pointer stay at the last commit successfully made. [I.e., just plain simple HEAD.]
The CHERRY_PICK_HEAD ref is set to point at the commit that introduced the change that is difficult to apply. [That is def in my picture above.]
Paths in which the change applied cleanly are updated both in the index file and in your working tree. [I.e., if many files are changed in the commit, those that can cleanly be applied, are.]
For conflicting paths, the index file records up to three versions, as described in the "TRUE MERGE" section of git-merge[1]. The working tree files will include a description of the conflict bracketed by the usual conflict markers <<<<<<< and >>>>>>>. [I.e., the same as a merge conflict, with some "conjured out of thin air" common ancestor.]
No other modifications are made.

And finally, the rest of the sentence:

Given one or more existing commits, apply ... recording a new commit for each.

If you give it more than one commits, maybe explicitly as in git cherry-pick sha1 sha2 sha3... or implicitly git cherry-pick sha1..sha2, then the above just runs in a simple loop, stopping after the last pick, or when there occurs a merge conflict.

Your Questions

What CHERRY_PICK_HEAD ref is?

2. The CHERRY_PICK_HEAD ref is set to point at the commit that introduced the change that is difficult to apply.

If it tries to pick commit def, and a merge conflict occurs, then CHERRY_PICK_HEAD will point at def.

By running git cherry-pick 2^..4, what is the sequence of actions git does and exactly between which commits git use diff?

As described above:

Commit 2 is picked, i.e.
- Git calculates the diff between 1 and 2.
- If possible, that diff is applied as a 2-way merge to HEAD and committed.
- If not possible, that diff is applied as a 3-way merge to HEAD and committed.
- If that is not possible (i.e., merge conflict), then you are left to resolve the conflict manually as usual, and it will wait for you issuing git cherry-pick --continue.
Commit 3 is picked, i.e. ... the same.
Commit 4 is picked, i.e. ... the same.

By running git cherry-pick 1..8, what git will do?

The same, but this time it will pick commit 2, 3, 4, 8.

(The fact that the "first" commit in the range is not picked is the usual behaviour, for example git log 2^..4 or git log 1..8 would output the same commits - in fact the same that would be picked. This is described in the cherry-pick help page under <commits> including links to how git walks revisions, for all the details. This is not a property of git cherry-pick but of how these .. ranges work.)

I'm sorry that you find the answer offensive. I assumed you had not found that help page because it seemed to me that it answered your question. I'll edit the answer with more details. @StavAlfi — AnoE, Nov 04 '17 at 17:13
"git cheats to find a common ancestor to be able to do a 3-way merge" - Can you specify __who__ is the common ancestor when cherry-pick can't apply the path cleanly? — Stav Alfi, Nov 09 '17 at 21:26
@StavAlfi: can't explain it better than in the link https://stackoverflow.com/questions/36993683/what-are-the-three-files-in-a-3-way-merge-for-interactive-rebasing-using-git-and ... — AnoE, Nov 09 '17 at 23:17
Thanks for a great answer which lead me to learn what are `diff`,`patch`,`format-patch`,`blob`,`tree`,`apply`,`am`,`2-way-merge`,`3-way-merge`,`cherry-pick`. As you can understand now, it was difficult for me to understand the man page. All in all, It took me 6 days to fully and completely understand your answer. so thanks for the homework and your knowledge. Hope to see you next time ;-) — Stav Alfi, Nov 10 '17 at 21:59

score 1 · Answer 2 · answered Nov 04 '17 at 18:28

I have failed to find any article/toturial which explain what exactly happands when one try to cherry-pick multiple commits.

In this case, the cherry-pick code uses Git's sequencer, which is also used for git am and git revert (and in very recent versions of Git, some cases of git rebase—git rebase is, as you read, partly implemented using git cherry-pick, although it's also partly implemented using git am: which one you get depends on flags you supply to git rebase). Note that internally, git revert and git cherry-pick are the same command (built from builtin/revert.c).

The sequencer simply runs repeated "one-commit-at-a-time" Git sub-commands on a sequence of commits, with the option of skipping any individual commit if the single-shot command fails. The individual commit hash IDs are often—though not always—gathered by running git rev-list. So the first part of your "2." and "3." questions can be found by running git rev-list (though the result is not particularly useful to humans :-) since git rev-list is meant to produce output useful to other Git commands instead).

So, let's take these in order:

What CHERRY_PICK_HEAD ref is?

When the sequencer is run on one commit for cherry-pick or revert, it notices, writes the commit ID to CHERRY_PICK_HEAD or REVERT_HEAD, and invokes the code to do a single pick/revert. (Follow the link to the actual Git source on GitHub for further details.) Otherwise, it does the rev-list walk to build the list of commits, writes them to the sequencer directory (or immediately fails and rejects your attempt if there's an ongoing sequenced operation), and then does one cherry-pick or revert at a time. This calls do_pick_commit(), which is a fairly complicated function, but you can see that at line 1118, it also will write the current commit's hash ID to CHERRY_PICK_HEAD, if we are cherry-picking and we are going to stop for some reason.

Hence, whenever any individual cherry-pick fails and stops with an unmerged index, or stops after success due to the use of --no-commit, CHERRY_PICK_HEAD contains the hash ID of the commit that was being picked at the time the command stopped.

You can then resolve the problem and run git cherry-pick --continue. This particular invocation checks for the existence of the sequencer directory; if it's there, it assumes you have fixed the problem and attempts to continue the existing, on-going cherry-pick sequence.

  2--3--4  <-- dev
 /
1
 \
  5--6--7   <-- master (HEAD)
By running git cherry-pick 2^..4, what is the sequence of actions git does and exactly between which commits git use diff?

If you run:

git rev-list 2^..4

(replacing 2 and 4 with actual hash IDs, or using the name dev to identify commit 4) you'll see that this lists the hash IDs of 4, then 3, then 2 (in that order). When doing git cherry-pick, though, Git specifically uses a reversed order on each ".."-style selection, so that the actual commit hashes are those for 2, then 3, then 4.

The sequencer therefore writes those three hash IDs into the sequencing area, and then runs do_pick_commit on each one. Looking closely starting at lines 1043 and again at 1088, you can see that it's actually a little bit misleading to say that Git runs a diff between the parent and child of each commit. In fact, it runs a merge operation ("merge as a verb", as I like to put it), with the merge base being the parent of each commit and the to-be-merged commit as the --theirs commit. (The --ours commit is, as always, the current or HEAD commit.)

However, the merge operation itself does, in effect, run git diff between the merge base and each of the two branch tips. Since the merge base is the parent of the commit being cherry-picked, this diffs 2^ (or 1) vs 2 as the input to the --theirs side. It also diffs 2^ vs HEAD as the input to the --ours side, and then does a merge.

By default (without -n / --no-commit), Git will commit the result of this merge, if it succeeds, as a single-parent, non-merge commit. So while this particular cherry-pick performs a merge, it makes an ordinary commit. The commit message for this new commit is a copy of the commit message from the original commit, i.e., a copy of the message from commit 2 (plus a line added holding the original commit hash if you ask for that using -x).

If all has gone well, the sequencer moves on to commit 3. The parent of 3 is 2, so the sequencer invokes the merge machinery to merge commit 3 and the (newly created by previous step) HEAD using commit 2 as the merge base this time. This means that Git will diff 2 vs 3, and also 2 vs HEAD, combine the diffs, and if all goes well, make a new ordinary (non-merge) commit that becomes HEAD.

If that goes well, the sequencer moves on to commit 4, which behaves in the same way.

The final result is:

  2--3--4  <-- dev
 /
1
 \
  5--6--7--2'-3'-4'   <-- master (HEAD)

where 2' is a sort of copy of 2, 3' is a sort of copy of 3, and 4' is a sort of copy of 4.

  2---3---4
 /         \
1--5--6--7--8   <-- dev
 \
  9   <-- master (HEAD)
By running git cherry-pick 1..8, what git will do?

Here we are going to have multiple problems.

First, cherry-pick invokes the sequencer code since you've specified a range of commits, 2^..8. This particular sub-range gets reversed:

git rev-list --reverse 2^..8

This lists commits 2, 3, 4, 5, 6, 7, and 8 in some order, but what exactly is the order? We've asked for all commits reachable from commit 8 (including 8 itself), excluding all commits reachable from commit 2^ (i.e., commit 1). Certainly, without --reverse, we'd see commit 8 first, which means that with --reverse we will see commit 8 last. But 8 has two parents, namely 4 and 7. Git could pick either one here.

Without --topo-order, Git chooses the one with the most recent time-stamp first. Suppose that the two time stamps make it pick 7 first. We'll get 8, then 7 (so that after reversing we'll have 7, then 8, at the end). Now there are again two commits to choose next: 6, and 4. Suppose that the two time stamps make Git pick 4 next. We'll now have two commits to choose from: 6, and 3. This process repeats until the two legs in the graph re-converge at commit 1 (which we won't pick anyway).

The --reverse means that we get a linearized list ending with commit 8, but the order of 2, 3, 4, 5, 6, and 7 is determined by time stamps (specifically commit time stamps, not author time stamps). So it's not very easy, without looking at the commit time stamps or running git rev-list, to know which order each individual commit will be cherry-picked.

In any case, the sequencer will still cherry-pick each commit one at a time, in whatever order they come out of git rev-list --reverse. But eventually we'll cherry-pick all of 2/3/4/5/6/7, which are not merges, and then cherry-pick commit 8, which is a merge. Either way, we'll go through the code at lines 967–990. For the commits that are not merges, git cherry-pick will demand that we have not supplied the -m option. For the commit that is a merge—commit 8—git cherry-pick will demand that we do supply the -m option.

So this cherry-pick is guaranteed to fail. To make it work well, you must avoid cherry-picking the merge, and you should cherry-pick each individual range 2^..4 and 5^..7 (in either order).

How git cherry-pick a range of commits?

2 Answers2

More details

Your Questions