There is no correct answer to this question because it is underspecified.
Git history is simply a directed acyclic graph (DAG), and it's generally impossible to determine semantic relationships between two arbitrary nodes in a DAG unless the nodes are sufficiently labeled. Unless you can guarantee that the commit messages in your example graph follow a reliable, machine-parseable pattern, the commits are not sufficiently labeled—it's impossible to automatically identify the commits you are interested in without additional context (e.g., guarantees that your developers follow certain best practices).
Here's an example of what I mean. You say that commit a1
is associated with branch1
, but this can't be determined with certainty just by looking at the nodes of your example graph. It's possible that once upon a time your example repository history looked like this:
* merge branch1 into branch2 - branch2's head
|\
_|/
/ * b1
| |
| |
_|_/
/ |
| * a1
* / m1
|/
|
* start - master's head
Note that branch1
doesn't even exist yet in the above graph. The above graph could have arisen from the following sequence of events:
branch2
is created at start
in the shared repository
- user#1 creates
a1
on his/her local branch2
branch
- meanwhile, user#2 creates
m1
and b1
on his/her local branch2
branch
- user#1 pushes his/her local
branch2
branch to the shared repository, causing the branch2
ref in the shared repository to point to a1
- user#2 tries to push his/her local
branch2
branch to the shared repository, but this fails with a non-fast-forward error (branch2
currently points to a1
and can't be fast-forwarded to b1
)
- user#2 runs
git pull
, merging a1
into b1
- user#2 runs
git commit --amend -m "merge branch1 into branch2"
for some inexplicable reason
- user#2 pushes, and the shared repository history ends up looking like the above DAG
Some time later, user#1 creates branch1
off of a1
and creates a2
, while user#2 fast-forward merges m1
into master
, resulting in the following commit history:
* merge a1 into b1 - branch2's head
* |\ a2 - branch1's head
| _|/
|/ * b1
| |
| |
_|_/
/ |
| * a1
* / m1 - master's head
|/
|
* start
Given that this sequence of events is technically possible (although unlikely), how can a human let alone Git tell you which commits "belong" to which branch?
Parsing Merge Commit Messages
If you can guarantee that users don't change merge commit messages (they always accept the Git default), and that Git has never and will never change the default merge commit message format, then the merge commit's commit message can be used as a clue that a1
started off on branch1
. You'll have to write a script to parse the commit messages—there are no simple Git one-liners to do this for you.
If Merges are Always Intentional
Alternatively, if your developers follow best practices (each merge is intentional and is meant to bring in a differently-named branch, resulting in a repository without those stupid merge commits created by git pull
), and you are not interested in the commits from a completed child branch, then the commits you're interested in are on the first-parent path. If you know which branch is the parent of the branch you are analyzing, you can do the following:
git rev-list --first-parent --no-merges parent-branch-ref..branch-ref
This command lists the SHA1 identifiers for the commits that are reachable from branch-ref
excluding the commits reachable from parent-branch-ref
and the commits that were merged in from child branches.
In your example graph above, assuming parent order is determined by your annotations and not by the order of the lines going into a merge commit, git rev-list --first-parent --no-merges master..branch1
would print the SHA1 identifiers for commits a4, a3, a2, and a1 (in that order; use --reverse
if you want the opposite order), and git rev-list --first-parent --no-merges master..branch2
would print the SHA1 identifiers for commits b4, b3, b2, and b1 (again, in that order).
If Branches Have Clear Parent/Child Relationships
If your developers do not follow best practices and your branches are littered with those stupid merges created by git pull
(or an equivalent operation), but you have clear parent/child branch relationships, then writing a script to perform the following algorithm may work for you:
Find all commits reachable from the branch of interest excluding all commits from its parent branch, its parent's parent branch, its parent's parent's branch, etc., and save the results. For example:
git rev-list master..branch1 >commit-list
Do the same for all child, grandchild, etc. branches of the branch of interest. For example, assuming branch2
is considered to be a child of branch1
:
git rev-list ^master ^branch1 branch2 >commits-to-filter-out
Filter out the results of step #2 from the results of step #1. For example:
grep -Fv -f commits-to-filter-out commit-list
The trouble with this approach is that once a child branch is merged into its parent, those commits are considered to be part of the parent even if development on the child branch continues. Although this makes sense semantically, it does not produce the result you say you want.
Some Best Practices
Here are some best practices to make this particular problem easier to solve in the future. Most if not all of these can be enforced via clever use of hooks in the shared repository.
- Only one task per branch. Multiple tasks are prohibited.
- NEVER permit development to continue on a child branch once it has been merged to its parent. Merging implies that a task is done, end of story. Answers to anticipated questions:
- Q: What if I discover a bug in the child branch? A: Start a new branch off of the parent. Do NOT continue development on the child branch.
- Q: What if the new feature isn't done yet? A: Then why did you merge the branch? Perhaps you merged a complete subtask; if so, the remaining subtasks should go on their own branches off of the parent branch. Do NOT continue development on the child branch.
- Forbid the use of
git pull
- A child branch must not be merged into its parent unless all of its children branches have been merged into it.
- If the branch does not have any children branches, consider rebasing it onto the parent branch before merging with
--no-ff
. If it does have children branches, you can still rebase, but please preserve the --no-ff
merges of the children branches (this is trickier than it should be).
- Merge the parent branch into the child branch frequently to make merge conflicts easier to resolve.
- Avoid merging a grandparent branch directly into its grandchild branch—merge into the child first, then merge the child into the grandchild.
If all of your developers follow these rules, then a simple:
git rev-list --first-parent --no-merges parent-branch..child-branch
is all you need to see the commits that were made on that branch minus the commits made on its children branches.