7

I have a Mercurial repo that I am converting to Git. The commit history is quite large and I do not need all of the commit history in the new repo. Once I convert the commit history to Git (and before pushing to the new repo), I want to squash all the commits before a certain tag into one commit.

So, if I have:

commit 6
commit 5
commit 4
commit 3
commit 2
commit 1 -- First commit ever

I want to end up with:

commit 6
commit 5
commit X -- squashed 1, 2, 3, 4

Note: There are thousands of commits that I need to squash. So, manually picking/marking them one by one is not an option.

GreenSaguaro
  • 2,968
  • 2
  • 22
  • 41

4 Answers4

7

The other answers so far suggest rebase. This can work, in some cases, depending on the commit graph in the converted-to-Git repository. The new fancier rebase with --rebase-merges can definitely do it. But it's kind of a clumsy way to go about it. The ideal way to do this is to convert commits starting at the first one you want to keep. That is, have your Mercurial exporter export to Git, as Git's first commit, the revision you want to pretend is the root. Have the Mercurial exporter go on to export that commit's descendants, one at a time into the importer, in the same way that the exporter was always going to do this job (whatever way that may be).

Whether and how you can do this depends on what tool(s) you are using to convert. (I have not actually done any of these conversions, but most people seem to use hg-fast-export and git fast-import. I have not looked much at the inner details of hg-fast-export but there's no obvious reason it couldn't do this.)


Fundamentally (internally), Mercurial stores commits as changesets. This is not the case for Git: Git stores snapshots instead. However, Mercurial checks out (i.e., extracts) snapshots, by summing together changesets as required, so if your tool works by doing hg checkout (or the internal equivalent thereof), there is no issue here in the first place: you just avoid checking out revisions prior to the first snapshot you want, and import those into Git, and the resulting Git history will begin at the desired point.


If the tools you have make this inconvenient, though, note that after converting the entire repository history, including all branches and merges, into Git snapshots, your Git repository makes this relatively easy as a second pass. Your Git history might, e.g., look like this:

          o-..-o            o--o   <-- br1
         /      \          /
...--o--o--....--o--*--o--o--o--o   <-- br2
      \         /             \
       o--...--o               o   <-- master

where commit * is the first commit you wanted to see in your Git repository. (Note that if there are multiple histories going back before *, you have a different issue and cannot do this kind of transformation in the first place without additional history-modification. But as long as * is on a sort of choke point, as it is in this diagram, it's easy to snip the graph here.)

To remove everything before *, simply use git replace to make an alternative commit that's very much like commit *, but has no parent:

git replace --graft <hash-of-*>

You now have a replacement that most of Git will use instead of *, that has no parent commit. Then run git filter-branch over all branches and tags, with the no-op filter:

git filter-branch --tag-name-filter cat -- --all

Or, once git filter-repo is included with Git (or if you've installed it):

git filter-repo --force

(be careful with the --force option when using filter-repo: this makes it destroy the old history in this repository, but in this csae, that's what we want).

This will copy every reachable commit, including the substitute * but excluding * and its own history, to new commits, then update your branch and tag names.

If using filter-branch, remove the refs/originals/ name-space (see the git filter-branch documentation for details), force early scavenging of the original objects if you like (the extra commits will eventually fall away on their own), and you're done.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Seems to be working as I what I was hoping for. The `*` commit is disjoint from all prior commits. But, how do I delete the old history? I followed the checklist in the documentation for `git filter-branch` and ran these commands: `git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d`, `git reflog expire --expire=now --all`, and `git gc --prune=now`. Starting from the commit before `*`, I need all of that history to disappear. – GreenSaguaro Dec 04 '18 at 05:16
  • I was able to get around needing to delete the history by simply only pushing doing some branch renaming and only pushing master to the new blank repo. And only pushing specific tags. – GreenSaguaro Dec 04 '18 at 06:42
  • 1
    Nowadays git documentation suggest to _"use an alternative history filtering tool such as git filter-repo"_ ([doc](https://git-scm.com/docs/git-filter-branch#_warning)) instead of git-filter-branch. The corresponding solution ([doc](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_parent_rewriting_2)) is: `git replace --graft ` , `git filter-repo --force` – Carlo Bellettini Feb 19 '21 at 19:17
  • @CarloBellettini: Unfortunately Git distribtions don't come with filter-repo (not yet at least). This particular transition, from filter-branch to filter-repo, is not as smooth as one might wish... – torek Feb 19 '21 at 21:28
  • 1
    @torek, right... in fact at the moment it is just a suggestion in the git documentation... but I thought it could be appropriate to remark it here... so that you can put an updated note in your good answer – Carlo Bellettini Feb 20 '21 at 10:44
6

To do all of those precisely, Steps will be

  1. Checkout to the specific commit
  2. Squash everything before it to this particular commit
  3. Cherry-pick the commits that happened after this
  4. Delete your existing branch
  5. Save your recently cooked head into the same branch name

function git_squash_from() {
    COMMIT_TO_SQUASH=$1
    SQUASH_MESSAGE=$2

    STARTING_BRANCH=$(git rev-parse --abbrev-ref HEAD) # This will be overwritten
    CURRENT_HEAD=$(git rev-parse HEAD)

    echo From $CURRENT_HEAD to the successor of  $COMMIT_TO_SQUASH will retain, from $COMMIT_TO_SQUASH to beginging will be squashed

    git checkout $COMMIT_TO_SQUASH
    git reset $(git commit-tree HEAD^{tree} -m "$SQUASH_MESSAGE")
    git cherry-pick $CURRENT_HEAD...$COMMIT_TO_SQUASH
    git branch -D $STARTING_BRANCH
    git checkout -b $STARTING_BRANCH    
}

git_squash_from 87ef7fa "Squash ... "

You can extend it further to build the SQUASH_MESSAGE from all commit messages.

pPanda_beta
  • 618
  • 7
  • 10
5

Suppose the original branch is master, and the new branch is new.

git checkout --orphan new commit4
git commit -m "squash commits"
git branch tmp master
git rebase commit4 tmp --onto new
git checkout new
git merge tmp
git branch -D tmp

The option "-p" is needed in "git rebase" if you want to keep the merge commits.

ElpieKay
  • 27,194
  • 6
  • 32
  • 53
  • Note: the old `--preserve-merges` will soon be replaced by `--rebase-merges`: https://stackoverflow.com/a/50555740/6309 – VonC Dec 03 '18 at 07:24
0

While git reset --soft could be an option for squashing one set of commits (as in here), I would recommend, for multiple set of commits:

  • having one original Git repo
  • doing patches between two tags (if you can go from one tag to the next),
  • applying each patch to a new Git repo where you store those squashed commits as one patch after the other.

Note this applies to the first commit, through the git rebase --root option.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250