6

I have two branches in git, where one branch master contains all commits, and another branch, e.g., release, which contains some cherry-picked commits from the first branch master. Since the commits are cherry-picked in release, they have different commit hashes than the corresponding commits in master, but the commit messages are the same.

Now I want to find commits from master, which were not cherry-picked into release. Note that the cherry-picked commits might be different in code from original commits due to conflict resolutions. How can I do it? Is there native support in git for this?

Example:

master branch:

git checkout master
git log --oneline -7

gives

2cba4b1d (HEAD -> master) Message subject for commit 7
f54fc16f Message subject for commit 6
4d871cbd Message subject for commit 5
a83ed44c Message subject for commit 4
48d0fb73 Message subject for commit 3
931da9a6 Message subject for commit 2
8553323b Message subject for commit 1

release branch

git checkout release
git log --oneline -5

gives

d65a04c6 (HEAD -> release) Message subject for commit 7
8aeecd92 Message subject for commit 6
2a54e335 Message subject for commit 4
99985f38 Message subject for commit 3
e76a9bb4 Message subject for commit 1

So the difference between the two branches will be two commits with message subjects:

Message subject for commit 5
Message subject for commit 2

It is also OK if it shows commit hashes:

4d871cbd Message subject for commit 5
931da9a6 Message subject for commit 2

Additional clarifications and requirements:

The above example returns the diff in the same order as commits were merged. Getting the same order in the result as in the original commit logs helps to identifier commits in the original commit log of master. It would be nice if it is possible to achieve too.

In my case both branches have linear history and there are no merge commits.

k_rus
  • 2,959
  • 1
  • 19
  • 31

2 Answers2

5

Your question is very similar to another one I read months ago about a way to identify rebased commits. Like with rebase, cherry-picking is about extracting the changes done in a commit and applying them to another commit. None of these commands keeps track of the original commit, there is just no need for git to differentiate the "copies", mainly because they could produce conflicts and the resulting commit would be different as you know.

Fortunately, git gives us a great help with cherry-picked commits: the --cherry-pick option. I invite you to read the whole description (about --left/right-only too), but this is the interesting part:

Omit any commit that introduces the same change as another commit on the “other side” when the set of commits are limited with symmetric difference.

Seems promising, right? No, here is the problem: the same change as another commit. What if the cherry-picked commit is different after a conflict resolution? Git is not able to mark it as cherry-picked because they are not patch-equivalent anymore and this option is not enough. Starting from the easiest situation (which is not your case), where all the cherry-picked commits have been applied successfully to the other branch, you could solve with this:

git log --format="%h %s" --cherry-pick --oneline --left-only --no-merges master...release 

It is very well explained in the documentation, except for the concept of symmetric difference, in summary it takes all the commits on master that were not successfully cherry-picked in release.

It is not perfect as I said, but at least we have a good starting point: now we just need to remove from this list all the commits whose commit message corresponds to the commit message of another commit in the release branch, finding the cherry-picked commits that produced a conflict. This is the only possible check you are left to do, excluding the reflog.

Here the script (not fully tested):

git log --format="%h %s" --cherry-pick --oneline --left-only --no-merges master...release |
while read cmt_log 
do
    cmt_msg=`echo "${cmt_log}" | awk '{ $1=""; print }'`
    git log --format=" %s" master..release | grep --fixed-string -s "${cmt_msg}" > /dev/null || echo ${cmt_log}
done

Basically, from the %h %s string I save the subject(%s) only, then I use it with grep to find the match if exists, otherwise I print it on stdout. I specified --fixed-string in the grep options just to be sure that the commit message is not interpreted as a regular expression, matching something that it should not, for instance.

Marco Luzzara
  • 5,540
  • 3
  • 16
  • 42
  • Thank you! Works greatly. I also reduced commits from `master` by specifying a commit from where to compare with `release`. – k_rus Apr 12 '21 at 07:00
1

Try to following:

comm -12 <(git rev-list ^master release --oneline | awk '{$1=""; print $0}' | sort) <(git rev-list ^release master --oneline | awk '{$1=""; print $0}' | sort)

This should take the commits that are reachable from release but not from master and sort them. Then it will take the commits that are reachable from master but not from release and sort those as well. Based on the common sorted strings, comm should tell you only based on the message heading, if there are any commons between the two branches.

We want the strings to be sorted, since there might be a case where some of the commits are not in perfect order, thus comm won't properly tell us if they are the same.


I tested it locally and it should provide you with some form of correct output, but maybe you have a different use case.


Now that you have the headings of the commits, you can git log them and see exactly which commits are cherry-picked.

We can put this thing in a script to look something like this

#!/bin/bash

titles=$(comm -12 <(git rev-list ^release master --oneline | awk '{$1=""; print $0}' | sort) <(git rev-list ^master release --oneline | awk '{$1=""; print $0}' | sort))

IFS=$'\n' read -rd '' -a ts <<<"$titles"
for t in "${ts[@]}"; do    
    c=$(echo $t | tr -d '\n')
    git log master --oneline --grep "$c"
    # Or if you don't want to see a dialog from git log, use this below instead
    git log master --oneline --grep "$c" | grep "$c"
done
mnestorov
  • 4,116
  • 2
  • 14
  • 24
  • Thank you for the suggestion. It works to find the diff. However, the commit messages are sorted by the commit message subject, which makes it difficult to identify in the original commit log. Do you see how to improve your code to order the result by time? In my case it is linear history. I will update the question. – k_rus Apr 09 '21 at 10:28
  • How do I use `git log` to get the commit by providing the commit message subject, i.e., the result of the script? I guess I can get the result in the commit history order after this. – k_rus Apr 09 '21 at 10:37
  • 1
    I'll update my answer in a moment to get you the logs of the actual commits – mnestorov Apr 09 '21 at 10:38
  • @k_rus I've added an example script, but I'm having trouble running it locally. Can you make sure it works for you? – mnestorov Apr 09 '21 at 11:18
  • @k_rus I've added some options. My bash-fu is beyond critique but it might work for you. – mnestorov Apr 09 '21 at 11:29
  • I am able to run the script, but the result is difficult to interpret. I am trying to figure out how to get commits only from `master`, since the scripts prints commits for all existing branches. – k_rus Apr 09 '21 at 11:56
  • @k_rus My bad, I added `master` to the git log command so that it only checks these titles for that branch only. So now it will check only for `master` branch. – mnestorov Apr 09 '21 at 11:57
  • I already did it but it didn't help unfortunately. I still get other branches. – k_rus Apr 09 '21 at 12:00
  • Huh, that's weird. You removed the `--all` parameter right? – mnestorov Apr 09 '21 at 12:01
  • Nope. Sorry Checking – k_rus Apr 09 '21 at 12:31
  • Thank you for the tip. Now I get only commits from master. I will play further to get them sorted by time. – k_rus Apr 09 '21 at 12:32
  • No worries. I hope my answer helped. Since you have the commits through git log, they should already be sorted by time. You can play with the options like `--oneline` and other formatting parameters to make the output more suitable for you :) – mnestorov Apr 09 '21 at 12:34
  • Nope, the script goes through commits one by one and `git log` is applied to each commit separately. So the commits are printed in lexicographical order still. – k_rus Apr 09 '21 at 12:40
  • Yeah you're right. My bad. I guess what you can do is to just save the different commit hashes upon each git log and then log them all at once. Git will start logging with the most recent commit on top. But I'm just guessing here – mnestorov Apr 09 '21 at 12:43