85

I'm trying to implement Git to manage creative assets (Photoshop, Illustrator, Maya, etc.), and I'd like to exclude files from Git based on file size rather than extension, location, etc.

For example, I don't want to exclude all .avi files, but there are a handful of massive +1GB avi files in random directories that I don't want to commit.

Any suggestions?

Jonas Stein
  • 6,826
  • 7
  • 40
  • 72
Warren Benedetto
  • 2,478
  • 2
  • 22
  • 25
  • 3
    I would warn that Git is known to have poor performance with large, binary files, and this problem is only now slowly being resolved. I would recommend against using Git for things besides plain text. – erjiang Oct 27 '10 at 17:11
  • @erjiang: Agreed. (except small binary files are perfectly fine; sure, they can't be prettily diffed, but no VCS can do that) Of course, the OP is saying that he explicitly does not want to track large files, so it might be all okay! – Cascabel Oct 27 '10 at 19:02
  • 2
    There's certainly no built-in way to do this. You could sort of implement it by adding a pre-commit hook which checks the size of all files to be committed, and aborts if any are over the threshold. You could add additional automation, but be careful. The last thing you want is to lose data by accidentally ignoring important content. (In order of increasing danger, you could: automatically unstage the large files, automatically add them to the gitignore, and even proceed with the modified commit instead of aborting.) – Cascabel Oct 27 '10 at 19:06
  • 4
    There's been some talk on the git ML recently about extending the .gitignore syntax, and one of the proposals is to allow .gitignore to delegate to an external tool to make decisions about what is and is not ignored. It sounds like this would be perfect for what you want. Unfortunately it's just a proposal for now, but this may show up eventually. – Lily Ballard Oct 27 '10 at 22:31
  • What is the point of controlling those files? Certainly, just saving them under a different name and even adding a small description in the name, or in a different text file will be more cost efficient and (in case of larger files) time efficient. I see no advantages of keeping track of avi files. – Rook Mar 05 '11 at 03:29
  • @KevinBallard can you provide a link to the discussion? – Jonas Stein Sep 22 '14 at 21:35
  • @JonasStein: That comment is almost 4 years old. I don't even remember it anymore. – Lily Ballard Sep 22 '14 at 23:17
  • To anyone arriving here, please see my updated (2020-05) answer. Git-LFS is probably the tool to use in scenarios like the one described in the original post. – earizon May 18 '20 at 06:08

7 Answers7

78

I'm new to .gitignore, so there may be better ways to do this, but I've been excluding files by file size using:

find . -size +1G | cat >> .gitignore

Obviously you'll have to run this code frequently if you're generating a lot of large files.

abendine
  • 781
  • 5
  • 3
  • 1
    This is a great way to control `wp-content/uploads` folder when deploying large sites built in WordPress. Thanks for this. – aubreypwd May 31 '14 at 20:05
  • 7
    I found I needed to remove the leading `./` from the start of each file before gitignore would work – IanB Jul 10 '14 at 04:19
  • 7
    This is a very old thread, but in case someone stumbled upon it and needed a pastable solution: `find . -size +1G | sed 's|^\./||g' | cat >> .gitignore` – antass Oct 20 '16 at 15:37
  • 6
    To avoid storing duplicate file names: `find . -size +1G | sed 's|^\./||g' | cat >> .gitignore; awk '!NF || !seen[$0]++' .gitignore`. `sed` will get rid of trailing `./`, and `awk` will remove all duplicate lines ignoring empty ones by checking the number of fields present in a line (`NF`) - this is useful if your `.gitignore` is organized in sections separated by empty lines. – antass Oct 20 '16 at 16:33
  • 25
    You all win a "useless use of cat award". – Sam Watkins Feb 05 '18 at 01:00
  • @SamWatkins do you have an example of how the above could be done without using `cat`? ( i think that it can just be completely dropped from the above? ) – baxx Feb 05 '20 at 11:03
  • lol, took me a second to realize what is the cat award: `find . -size +1G | sed 's|^\./||g' >> .gitignore` – Sida Zhou May 29 '20 at 08:22
  • 1
    You don't need `sed` either ... `find . -size +1G -printf '%P\n'` works too. – user3710044 Aug 15 '20 at 14:41
  • 1
    Can I do this in some kind of automated way? I mean something like putting a function in my repo (or in my gitignore if this possbile) that continously collects files larger than some size and puts them in the gitignore? – Robin Kohrs Apr 20 '21 at 08:58
  • @RobinKohrs I have added yet another answer using a githook to add some automatic behavior – KIC Jan 23 '22 at 17:38
13

Although the file size is very large and the following should not be an issue at all and provided that @abendine answer is correct, according to: https://stackoverflow.com/a/22057427/6466510

find * -size +1G | cat >> .gitignore

it would be far better. Have a look at this too: Difference between find . and find * in unix it turns out that replacing . with * here above, avoid to find things in .git directory.

andreagalle
  • 620
  • 6
  • 17
13

To satisfy github's <100MB file limit, run this:

find . -size +100M | cat >> .gitignore
stevec
  • 41,291
  • 27
  • 223
  • 311
  • 2
    In windows, run the same command and it might add `./` to the start of each path as below `./android/app/build/outputs/apk/debug/app-universal-debug.apk` . Removing the `./` as below solved my issue; `android/app/build/outputs/apk/debug/app-universal-debug.apk` – Kavidu Aloka Kodikara Apr 29 '21 at 13:20
6

I wanted to also offer a Windows version of this as well.

forfiles /s /c "cmd /q /c if @fsize GTR 1073741824 echo @relpath" >> .gitignore
tisaconundrum
  • 2,156
  • 2
  • 22
  • 37
6

(Update 2020-05)

Microsoft released time ago Git-LFS as Open-Source. Probably this is what most people really are searching for:

https://git-lfs.github.com/ C&P from the project page: "Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise."

earizon
  • 2,099
  • 19
  • 29
  • yes they shoud install git lfs and track the big files to automatic exclude them from the repositorys. – Skyborg Dec 20 '20 at 15:17
4

I want to add to all these answers that you can also just use a git hook to have something more automatic (or less human-error prone) like this:

cat .git/hooks/pre-commit

#!/bin/bash

echo "automatically ignoring large files"
find . -size 5M | sed 's|^\./||g' >> .gitignore
cat .gitignore | sort | uniq > .gitignore

git diff --exit-code .gitignore
exit_status=$?
if [ $exit_status -eq 1 ]
then
    set +e
    for i in `cat .gitignore`
    do
    set +e
        git rm --cached $i
    done

    git add .gitignore
    git commit .gitignore --no-verify -m"ignoring large files"

    echo "ignored new large files"
fi

It is pretty brute force and the downside is that in case there were new large files added by the git hook, the origin commit fails because the state (hash) changed. So you need to execute another commit to actually commit what you have staged. Consider this as a feature telling you that new large files were detected ;-)

KIC
  • 5,887
  • 7
  • 58
  • 98
1

Just adding an answer that summarizes the suggestions about "remove the leading ./" and "useless use of sed" and "useless use of cat"

find . -size +100M -printf '%P\n' >> .gitignore

fwiw i think that cat is fine :D

physincubus
  • 986
  • 2
  • 11
  • 26