Removing commits that have a size (in disk space) larger than a given value

Question

After reading through:

How to remove a too large file in a commit when my branch is ahead of master by 5 commits

https://help.github.com/en/articles/working-with-large-files

https://rtyley.github.io/bfg-repo-cleaner/

https://help.github.com/en/articles/removing-sensitive-data-from-a-repository

Git - get all commits and blobs they created

I couldn't find an elegant solution of removing commits that exceed a given size (on disk). These commits do not necessarily have large files, but are large in and of themselves (have many ~200 KB dependencies).

How can such commits be removed from the repository?

[The answer](https://stackoverflow.com/a/40698537/7976758) that you linked to starts with: "*The "size" of a commit can mean different things. If you mean how much disk storage it takes up... that's very tricky to tell in Git and probably unproductive.*" In short, there is no "an elegant solution of *calculating* commit size". — phd, May 24 '19 at 10:34

score 1 · Accepted Answer · answered Jun 04 '19 at 13:19

First a note :

git compresses files when it stores them in its .git/ structure, and tries to store similar files using only their diffs ;

in that sense, it is difficult to spot "what commit uses up the most space in my .git/ folder".

If you want to measure how much space the files in a commit take up when checked out :

git ls-tree -r -l <commitid>

will list the files along with their individual sizes

git ls-tree -r -l <commitid> | awk '{ sum += $4 } END { print sum }'

will print the total size of these files.

You can put the above shortcut in a script and see what commits take up more than xx bytes, the next thing is : can you get rid of said commits ?

You may tell git to delete the end of a branch :

If all 'B's mark 'big commits' :

               +-- create a new branch here
               v
*--*--*--*--*--*--B--B--B--B <- branchA
    \              \
     \              \-B--B <- branchB
      \
       *--*--*--* <- branchC
              \
               \--B <- branchD

In the above diagram, you can tell git to forget branchA, branchB and branchD (and possibly create a new reference to keep the first "no so big" commits),

but when a commit appears in the middle of a branch :

*--*--B--B--*--* <- branchE

your notion of "delete the two Bs" depends heavily on what is stored in your git repo and how you can remove these commits from a branch's history.

The general advice is : do not delete commits.

Thanks. In my case "B" commits are interleaved with normal sized commits (and are not located at the end of the branch (I added a filter in .gitignore for dependencies (with a few exceptions)). I agree that commits shouldn't be (generally) deleted, but in this case they pollute the repo (and are not critical to the project). — Sebi, Jun 04 '19 at 15:20
@Sebi : ok. Do you see how to edit the history of your repo ? — LeGEC, Jun 05 '19 at 07:39
Yes. I can remove the commits individually (I'll write a bash script for this). — Sebi, Jun 05 '19 at 21:32

Removing commits that have a size (in disk space) larger than a given value

1 Answers1