6

I have a bug in my editor (it's horrible) where on occasion big files get written to the working directory. I then do a git push without manually checking for these big new files and the git remote gets overloaded and eventually errors out.

Is there some check (maybe a git hook) that I can use to check if my repo is above a certain size in MBs?

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
  • Are the big files always saved in a certain folder? Do the file names have a consistent pattern such as an irregular extension? – Code-Apprentice Jul 12 '18 at 23:37
  • `git push` never does anything with uncommitted files. You must be committing them locally somehow. Are you blindly using `git add .` or similar? That's a bad anti-pattern. – ChrisGPT was on strike Jul 13 '18 at 00:08
  • yes I am committing them and push them by committing all files that aren't .gitignore'd –  Jul 13 '18 at 01:36
  • 1
    @OlegzandrDenman Have you got an answer yet? This is relevant for Bitbucket repos which have a maximum size of 1GB. – vineeshvs May 01 '19 at 12:45

4 Answers4

5

Git does not use the work-tree in any way when you run git push. Specifically, what git push pushes are commits, along with whatever objects—mostly files whose content was frozen into the commit at commit-time—are required to make those commits complete.1

Note that git commit itself also does not use the work-tree: it commits whatever is in the index (also called the staging-area and sometimes the cache). This is why you must git add your files before committing. There are a few options to git commit that make it automatically copy work-tree files over top of the versions of those files in the index / staging-area; but the principle remains: git commit commits what's in the index, not what's in the work-tree.

Your best bet at a Git hook for detecting this issue is therefore a pre-commit hook, as described in the githooks documentation:

pre-commit

    This hook is invoked by git commit(1), and can be bypassed with the --no-verify option. It takes no parameters, and is invoked before obtaining the proposed commit log message and making a commit. Exiting with a non-zero status from this script causes the git commit command to abort before creating a commit.

(There is a bit more to the documentation; follow the links to see.)

Writing Git hooks is a bit tricky (especially server side hooks) but this one is not too bad:

#! /bin/sh
# pre-commit hook: check for large files
TMP=$(mktemp)
trap "rm -f $TMP" 0 1 2 3 15
MAX_FILE_SIZE=1048576 # 1 MB
status=0
git ls-files --stage > $TMP
while read mode hash stage path; do
    objsize=$(git cat-file -s $hash)
    if [ $objsize -gt $MAX_FILE_SIZE ]; then
        echo "file too big: '$path' as staged exceeds $MAX_FILE_SIZE bytes" 1>&2
        status=1
    fi
done < $TMP
exit $status

(untested). You could instead opt for a pre-push hook, but that's later than appropriate.


1These Git objects are also compressed. Whenever possible, they are very-compressed by using existing previous objects already present on the server. So if you have a ten gigabyte text file, but you make one small change to it and commit, pushing that commit—even though it has a ten gigabyte file inside it—takes very little space since the so-called thin pack that Git sends winds up saying: Hey, remember that ten gigabyte object you already have? Take that one, remove a few bytes from the middle, and replace them with these other bytes.

torek
  • 448,244
  • 59
  • 642
  • 775
2

If you know the big file name or pattern e.g. suffix you can just add it to .gitignore until you resolve problems with your editor.

You can check this answer which describes a server-side update hook.

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
  • that's a great idea, you deserve an upvote for that, but if you can find a generic way to solve the problem for any unknown big files, that'd be best –  Jul 12 '18 at 22:45
  • @OlegzandrDenman Update the answer with a link to [this hook](https://stackoverflow.com/a/9126745/1602555). – Karol Dowbecki Jul 12 '18 at 22:46
  • @OlegzandrDenman it's obvious you can gitignore something based on name. But there is a problem that git add -A, an git status -s, doesn't tell you the size, and git commit -m "sfds" doesn't tell you size. And you only find out there was something massive, when pushing it. This can happen when dealing with frameworks or frameworks used within frameworks. – barlop Apr 22 '19 at 16:26
2

Since this is an ongoing concern, you should get into the habit of running git status before doing git commit. You can review the list of files that will be committed to look for ones that don't belong.

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
  • 2
    There can be loads in a project. especially when dealing with frameworks such that you haven't written or imported each individual file. And git status It won't tell you the size – barlop Apr 22 '19 at 16:24
0

Another approach, if you want the size of multiple commits:

With Git 2.29 (Q4 2020), "git for-each-ref --format= <>(man)" learned %(contents:size).

See commit b6839fd (16 Jul 2020), and commit 6e2ef8e, commit 9fcc9ca (10 Jul 2020) by Christian Couder (chriscool).
(Merged by Junio C Hamano -- gitster -- in commit be53706, 30 Jul 2020)

ref-filter: add support for %(contents:size)

Signed-off-by: Christian Couder

It's useful and efficient to be able to get the size of the contents directly without having to pipe through wc -c.

Also the result of the following:

git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c

is off by one as git for-each-ref(man) appends a newline character after the contents, which can be seen by comparing its output with the output from git cat-file(man).

As with %(contents), %(contents:size) is silently ignored, if a ref points to something other than a commit or a tag:

$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first

$ git for-each-ref --format='%(contents:size)' refs/mytrees/first

git for-each-ref now includes in its man page:

contents:size

The size in bytes of the commit or tag message.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250