26

Question

Given that I have a commit like the following:

A - B - C - D - G - H
     \         /
      - E - F -

How do I get a perfectly linear history following only the first parents?

Eg, I want to get:

A - B - C - D - G'- H'

It is expected that the sha1's of G and H will change, as noted above. (For obvious reasons). If the sha1's for A, B, C, and D change as well, then I want to know why.

Background

The intention is to avoid potential merge commits that would arise when doing a naive git rebase -i.

The real point is to hopefully gain additional insight in order to determine which commits are in a particular branch, eg, if we have the following graph:

I - J - K - L - M
 \     /   /   / 
  - N - O - P -

Where N was merged into K, but O was merged with --strategy=ours into L, and P was merged into M.

I want to be able to linearlize my history so that such problematic commits can be identified, and inspected. I want to have a tree where can identify that O was not put into the upstream branch, even if I potentially identify N and P as being potentially missing from the upstream branch as well, by using git cherry, however any suggestions here would be appreciated.

Arafangion
  • 11,517
  • 1
  • 40
  • 72

2 Answers2

23

As you said, the following command works:

git filter-branch --parent-filter 'cut -f 2,3 -d " "'

Why?

The problem you pose is solved by transforming each merge commit with a simple commit: this will simply remove the feature branches that were merged, since they will become orphan.

Each commit has one or more parent commits. Merge commit are the one which get more than one. Git stores this in each commit object of the history.

The git filter-branch command, with the --parent-filter option, allows to rewrite every commit's parent, passed to the filter as -p SHA1, repeated if there are more than one parent. Our filter cuts the parent and forces every commit to have a single parent.

For bonus, here's how to do it manually on a precise commit, by re-creating a new commit:

  • get the commit tree and the first parent

    tree=`git show -s --format=%T SHA1`
    parent=`git show -s --format=%P SHA1 | cut -d " " -f1`
    
  • make a new commit with the same tree, same message, and keep only the first parent as ancestor

    git show -s --format=%B SHA1 | git commit-tree $tree -p $parent
    
CharlesB
  • 86,532
  • 28
  • 194
  • 218
  • 1
    That's inspired me to try: `git filter-branch --parent-filter 'cut -f 2,3 -d " "'` It seems work, but I'm not yet sure of the edge cases or the performance. I do have many hundreds of rather large commits, though. I will wait for your answer, you might have a nicer/better solution. – Arafangion Aug 01 '13 at 12:53
  • I don't know, but it seems much simpler. I wanted to use `--commit-filter` with this kind of script, but I also saw `--parent-filter` option though I don't understand what the `cut -f 2,3` in your command? – CharlesB Aug 01 '13 at 13:24
  • 1
    The parent-filter specifies a program that expects an input on stdin in like ' -p sha1 -p sha1 -p sha1', each -p specifies a parent. (That string there was a merge commit with three parents). The '-d " "' there specifies that we interpret that as a list, delimited by spaces. -f there specifies 'Take the second, and third item". This results in the first '-p sha1' and we're done. – Arafangion Aug 01 '13 at 23:39
  • That '-- --all' doesn't make sense to me, though. I only want to do it on my current branch. – Arafangion Aug 01 '13 at 23:43
5

Do it manually with the following:

git rebase -i D-sha1

Edit the file to have G set to "edit". Keep "pick" for H. Save and close editor.

Now Git lets you edit G commit, let's modify it to a squash merge of the F commit. You want to:

#cancel the merge commit and position HEAD to D
git reset --hard HEAD~
#squash merge and continue
git merge --squash F-sha1
git rebase --continue

Note that you'll have to solve the same conflicts if you had to in the first merge...

As you say you'll only have new SHA1 for G and H.

CharlesB
  • 86,532
  • 28
  • 194
  • 218
  • How do I identify 'G' and 'H'? Lets say I have a large repository,. not just the trivial case I highlighted, and there are many of these merge commits. – Arafangion Aug 01 '13 at 08:07
  • If it is too complicated history with different merge commits to linearize, you might stay away from this, then. Git rebase won't do it correctly – CharlesB Aug 01 '13 at 08:12
  • I am fully aware that git rebase won't do it. I am expecting that the answer will involve git filter-branch with the --parent-filter set. – Arafangion Aug 01 '13 at 08:14
  • 1
    Even with `filter-branch` you'll still have to identify the "starting point" for the filtering or rebasing. I'd suggest writing a script that uses `git rev-list --merges` to find merge commits. I've never done this so am not quite sure how to proceed from that point, though. – torek Aug 01 '13 at 08:36
  • 1
    Same for me, might be doable but it would make a nice article/blog post if someone does this! – CharlesB Aug 01 '13 at 08:42
  • Yep, actually I want to use the root commit for this, ie, sha1 'A' here, or in other words, `git rev-list --max-parents=0 HEAD | head -n 1` – Arafangion Aug 01 '13 at 12:25
  • A `git commit` step is missing. Other than that, this saved my day. – astrojuanlu Oct 06 '14 at 07:28