2

I'm working with an open source repository that seems to have been copied over from either a combined git repo or some other kind of VCS. There are an enormous amount of commits in the repo, most of them with zero files changed:

Almost 40 thousand commits

Most of them have changed zero files

What would be the best way to list all the commits with no files changed, count them, and potentially remove them from the local git repo?

Edit: I'm specifically looking for a way to assess the extent of the issue before running a time consuming, destructive command like filter-branch as referenced in Remove empty commits in git

Nic Barker
  • 891
  • 1
  • 8
  • 16
  • 1
    Perhaps https://stackoverflow.com/a/28313729/3124288 is what you're looking for? – JKillian Nov 13 '17 at 23:39
  • Awesome, this is perfect for pruning the empties, thanks. – Nic Barker Nov 13 '17 at 23:42
  • @Whymarrh I think there's a small difference here - I was specifically interested in seeing the extent of the problem *before* using a destructive command like `filter-branch`, if you take a look at my answer. – Nic Barker Nov 14 '17 at 00:05
  • Although I'm also happy to add some of these as an additional answer there if it seems more appropriate. – Nic Barker Nov 14 '17 at 00:07

1 Answers1

1

So it turns out the keyword I was missing in Google was "empty" (was searching for "remove commits with no files changes", etc)

List commits that have no changes (empty commits):

git rev-list HEAD | while read commitHash; do
    if [ $(git diff-tree --name-status --no-commit-id $commitHash | wc -l) -eq 0 ]; then
        echo $commitHash
    fi;
done

List commits that have changes, and files changed (non empty commits):

git rev-list HEAD | while read commitHash; do
    git diff-tree --name-status $commitHash
done

Count empty commits

git rev-list HEAD | while read commitHash; do
    if [ $(git diff-tree --name-status --no-commit-id $commitHash | wc -l) -eq 0 ]; then
        echo '1'
    fi;
done | wc -l

Count non empty commits

git rev-list HEAD | while read commitHash; do
    if [ $(git diff-tree --name-status --no-commit-id $commitHash | wc -l) -gt 0 ]; then
        echo '1'
    fi;
done | wc -l

And finally, as per @JKillian's suggestion, remove all empty commits from the repo using git filter-branch:

git filter-branch --tag-name-filter cat --commit-filter 'git_commit_non_empty_tree "$@"' -- --all

Documentation on filter-branch, specifically --commit-filter:

https://git-scm.com/docs/git-filter-branch#git-filter-branch---commit-filterltcommandgt

Nic Barker
  • 891
  • 1
  • 8
  • 16
  • 1
    observe that these show empty commits relevant to the position of `HEAD`. Ie. if a root commit was checked out you'd get different results than if your on `main/master` – CervEd Jun 05 '21 at 00:44