6

We want to prevent:

  • Very large text files (> 50MB per file) from being committed to git instead of git-lfs, as they inflate git history.
  • Problem is, 99% of them are < 1MB, and should be committed for better diffing.
  • The reason of variance in size: these are YAML files, they support binary serialization via base64 encoding.
  • The reason we can't reliably prevent binary serialization: this is a Unity project, binary serialization is needed for various reasons.

Given:

  • GitHub hosting's lack of pre-receive hook support.
  • git-lfs lack of file size attribute support.

Questions:

  1. How can we reliably prevent large files from being added to commit?
  2. Can this be done through a config file in repo so all users follow this rule gracefully?
  3. If not, can this be done by bash command aliasing so trusted users can see a warning message when they accidentally git add a large file and it's not processed by git-lfs?

(Our environment is macOS. I have looked at many solutions and so far none satisfy our needs)

bitinn
  • 9,188
  • 10
  • 38
  • 64
  • [How to limit file size on commit?](https://stackoverflow.com/q/39576257/3776858) – Cyrus Dec 09 '18 at 07:43
  • Can you include (at least some of) the solutions you've looked at and discarded? I'm not especially well-versed in git hooks, but this sounds like something a `pre-commit` hook could handle to me. – solarshado Dec 09 '18 at 07:46
  • @Cyrus I didn't read that one, thx, but I need to test if it includes the possibility of a file that would be tracked by git-lfs. Yes it will prevent large file, but does git-lfs kicks in before pre-commit, that's the question. – bitinn Dec 09 '18 at 07:53
  • @solarshado I will add some links later, but I am assuming pre-commit doesn't consider git-lfs, but I really don't know enough about git-lfs to say one way or another. – bitinn Dec 09 '18 at 07:55
  • I'm not familiar with git-lfs either, but based on [its home page](https://git-lfs.github.com/), it looks like it "Just Works:tm:". Skimming over some of the [docs](https://github.com/git-lfs/git-lfs/tree/master/docs/man), it sounds like it uses a pre-push hook for some (most?) of its magic. – solarshado Dec 09 '18 at 08:08

2 Answers2

7

Alright, with helps from CodeWizard and this SO answer, I managed to create a good guide myself:

First, setup your repo core.hooksPath with:

git config core.hooksPath .githooks

Second, create this pre-commit file inside .githooks folder, so it can be tracked (gist link), then remember to give it execution permission with chmod +x.

#!/bin/sh
#
# An example hook script to verify what is about to be committed.
# Called by "git commit" with no arguments. The hook should
# exit with non-zero status after issuing an appropriate message if
# it wants to stop the commit.
#
# To enable this hook, rename this file to "pre-commit".

# Redirect output to stderr.
exec 1>&2

FILE_SIZE_LIMIT_KB=1024
CURRENT_DIR="$(pwd)"
COLOR='\033[01;33m'
NOCOLOR='\033[0m'
HAS_ERROR=""
COUNTER=0

# generate file extension filter from gitattributes for git-lfs tracked files
filter=$(cat .gitattributes | grep filter=lfs | awk '{printf "-e .%s$ ", $1}')

# before git commit, check non git-lfs tracked files to limit size
files=$(git diff --cached --name-only | sort | uniq | grep -v $filter)
while read -r file; do
    if [ "$file" = "" ]; then
        continue
    fi
    file_path=$CURRENT_DIR/$file
    file_size=$(ls -l "$file_path" | awk '{print $5}')
    file_size_kb=$((file_size / 1024))
    if [ "$file_size_kb" -ge "$FILE_SIZE_LIMIT_KB" ]; then
        echo "${COLOR}${file}${NOCOLOR} has size ${file_size_kb}KB, over commit limit ${FILE_SIZE_LIMIT_KB}KB."
        HAS_ERROR="YES"
        ((COUNTER++))
    fi
done <<< "$files"

# exit with error if any non-lfs tracked files are over file size limit
if [ "$HAS_ERROR" != "" ]; then
    echo "$COUNTER files are larger than permitted, please fix them before commit" >&2
    exit 1
fi

exit 0

Now, assuming you got both .gitattributes and git-lfs setup properly, this pre-commit hook will run when you try to git commit and make sure all staged files not tracked by git-lfs (as specified in your .gitattributes), will satisfy the specified file size limit.

Any new users of your repo will need to setup core.hooksPath themselves, but beyond that, things should just work.

Hope this helps other Unity developers fighting with growing git repo size!

bitinn
  • 9,188
  • 10
  • 38
  • 64
  • 1
    `man git-config` says about `core.hooksPath` : `This configuration variable is useful in cases where you’d like to centrally configure your Git hooks instead of configuring them on a per-repository basis`. Your command `git config core.hooksPath .githooks` returns `fatal: not in a git directory` when I run it from `$HOME`, shall I need to use the `--global` option if I want the hooks to be usable by every projects on my machine ? – SebMa Nov 28 '20 at 17:34
  • Kind of the same question asked here with a very nice answer imo: https://stackoverflow.com/a/39578014/1582649 – Alexandre Schmidt Jun 14 '23 at 13:59
3
  • How can we reliably prevent large files from being added to commit?
  • Can this be done through a config file in the repo so all users follow this rule gracefully? Since GitHub doesn't support server-side hooks you can use client-side hooks. As you probably aware, those hooks can be passed and be disabled with no problem, but still, this is a good way to do it.

core.hooksPath

Git v2.9 added the ability to set the client hooks on remote folder. Prior to that, the hooks must have been placed inside the .git folder.

This will allow you to write scripts and put them anywhere. I assume you know what hooks are but if not feel free to ask.


How to do it?

Usually, you place the hooks inside your repo (or any other common folder).

# set the hooks path. for git config, the default location is --local
# so this configuration is locally per project
git config core.hooksPath .githooks
CodeWizard
  • 128,036
  • 21
  • 144
  • 167
  • very nice, I will check it out and report back – bitinn Dec 09 '18 at 08:03
  • Alright, so this works, but the hook took me a while (code from other SO has issues handling whitespace in path), so I made mine, feel free to append them for a more complete answer: https://gist.github.com/bitinn/834756d57f3d47df97937aab68162ae6 – bitinn Dec 09 '18 at 09:42
  • Cool, glad to help – CodeWizard Dec 09 '18 at 09:45
  • sorry I speak a bit too soon, this doesn't seem to work with git-lfs yet, as in, it isn't smart enough to skip file that can be handled by git-lfs (say a PNG file, which is likely over 1MB, but can be committed because lfs could handle it) – bitinn Dec 09 '18 at 09:46
  • You can do it using smudge-clean scripts https://stackoverflow.com/questions/41773264/how-to-preserve-tabs-in-github-for-makefiles/41773892#41773892 – CodeWizard Dec 09 '18 at 09:51
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/184953/discussion-between-bitinn-and-codewizard). – bitinn Dec 09 '18 at 09:52