37

I just migrated a project from Mercurial to Git. Mercurial adds empty commits when you add tags, so I ended up with empty commits in Git that I would like to remove.

How do I remove empty commits (commits that do not have any files in them) from Git?

Thanks.

Tim Sylvester
  • 22,897
  • 2
  • 80
  • 94
Greeso
  • 7,544
  • 9
  • 51
  • 77

2 Answers2

50

One simple (but slow) way to do this is with git filter-branch and --prune-empty. With no other filters, no other commits will be altered, but any empty ones will be discarded (which will cause all subsequent commits to have new parent-IDs and is therefore still "rewrites history": not a big deal if this is your initial import from hg to git, but is a big deal if others are using this repository already).

Note all the usual caveats with filter-branch. (Also, as a side note, an "empty commit" is really one that has the same tree as the previous commit: it's not that it has no files at all, it's that it has all the same files, with the same modes, and the same contents, as its parent commit. This is because git stores complete snapshots for each commit, not differences from one commit to the next.)


Here is a tiny example that hides a lot of places you can do fancier things:

$ ... create repository ...
$ cd some-tmp-dir; git clone --mirror file://path-to-original

(This first clone step is for safety: filter-branch can be quite destructive so it's always good to start it on a new clone rather than the original. Using --mirror means all its branches and tags are mirrored in this copy, too.)

$ git filter-branch --prune-empty --tag-name-filter cat -- --all

(Now wait a potentially very long time; see documentation on methods to speed this up a bit.)

$ git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

(This step is copied out of the documentation: it discards all the original branches from before running the filter. In other words, it makes the filtering permanent. Since this is a clone, that's reasonably safe to do even if the filter went wrong.)

$ cd third-tmp-dir; git clone --mirror file://path-to-filtered-tmp

(This makes a clean, nicely-compressed copy of the filtered copy, with no leftover objects from the filtering steps.)

$ git log     # and other commands to inspect

Once you're sure the clone of the filtered clone is good, you don't need the filtered clone, and can use the last clone in place of the first. (That is, you can replace the "true original" with the final clone.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • Can you show me an example on how to use it? I apologize about the request but I am new to git – Greeso Feb 04 '15 at 04:52
  • Note this won't remove empty merge commits, check https://stackoverflow.com/questions/9803294/ – Jakub Bochenski Jun 04 '18 at 10:05
  • @JakubBochenski: that's true. In a sense, there's no such thing as an empty merge: an empty commit can be removed from the DAG without affecting anything but removing a merge changes the DAG, so even if the snapshot in the merge is the same as the snapshots in *both* inputs, it's not OK to remove the merge itself. – torek Jun 04 '18 at 14:45
  • @torek I'm not sure I follow: removing an empty commit will surely affect all the commits that used to have it as the parent – Jakub Bochenski Jun 04 '18 at 15:31
  • @torek on the other hand my use case was: prune commits to only include changes in certain path, then remove regular empty commits, then remove PR merge commits that no longer had any effective changes; I don't see why it wouldn't be OK to do that last step – Jakub Bochenski Jun 04 '18 at 15:33
  • You're right that snipping a single-parent commit out of a chain requires reparenting all the subsequent commits—but `git filter-branch` is already doing that anyway. The problem with snipping out a *merge* is that the code would need nonlocal graph information to determine whether the other parent(s) are reachable by some other name(s). (It's certainly possible to write a program that does this, it's just not the job filter-branch is already doing.) – torek Jun 04 '18 at 16:34
  • But, if you want to remove a merge, there's a manual process using `git replace` (make a replacement commit object) followed by `git filter-branch` (rewrite the graph using the replacement, effectively cementing the replacement forever). You could replace each merge and then run one big filter-branch. It's not a lot of fun to do, but for a one-time rewrite, it would not be that bad. – torek Jun 04 '18 at 16:42
5

Using --commit-filter should be faster than using --prune-empty

$ git filter-branch --tag-name-filter cat --commit-filter 'git_commit_non_empty_tree "$@"' -- --all

Then clean the backup refs as in torek's answer

$ git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

Taken from here: Git - remove commits with empty changeset using filter-branch

Community
  • 1
  • 1
bengineerd
  • 1,268
  • 1
  • 16
  • 18
  • thanks! the `for-each-ref | xargs` failed for me though. Git-update-ref showed its usage message. I could fix it by using a different `xargs` syntax: `git for-each-ref --format="%(refname)" refs/original/ | xargs -I{} git update-ref -d {}` -- this was with xargs --version "xargs (GNU findutils) 4.5.11" – Jules Kerssemakers Aug 22 '18 at 14:18