3

We have a Git repo and folks before me in the company mistakenly added a LOT of binary files (PNG, JPGs, JARs, etc) in the repo.

How can I find ALL the binary files recursively and get the total size?
I would want to of course exclude the files in the hidden .git directory tree, and also empty files.

I want to remove ALL the binary files in our Git repo and put them in a central artifact manager like Artifactory or Nexus.

Chris F
  • 14,337
  • 30
  • 94
  • 192
  • There's no one agreed-upon definition of "binary file", so any answer you choose may need a bit of adapting to *your* definition, unless you get lucky and whoever made the answer already agrees with yours. – torek Sep 12 '19 at 15:09

1 Answers1

1

Instead of looking directly on the filesystem, you can ask Git itself to find/identify large commits in git history, using this script.

You will then easily identify the misplaced big elements:

...
0d99bb93129939b72069df14af0d0dbda7eb6dba 542455 path/to/some-image.jpg
2ba44098e28f8f66bac5e21210c2774085d2319b 12446815 path/to/hires-image.png
bd1741ddce0d07b72ccf69ed281e09bf8a2d0b2f 65183843 path/to/some-video-1080p.mp4

You can then remove said big elements from the history.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks. I'm on OSX so I installed coreutils, replaced numfmt with gnumfmt, and it worked. However, I'm still JUST interested in BINARY files, and their total size - I wanna remove them from our repo and put them in a central repo server like Artifactory or Nexus. – Chris F Sep 12 '19 at 13:28
  • @ChrisF You would need to checkout them first, to do a mvn deploy-file and publish them to Nexus, and then delete them through BFG. – VonC Sep 12 '19 at 14:29
  • I understand what I NEED to do, that's why I want a list of binary files. – Chris F Sep 12 '19 at 14:46
  • @ChrisF Would the extension of those (big) file be enough for you to grep said files from the large commits listed in the answer? – VonC Sep 12 '19 at 14:47
  • No, we have more than just those. In another company I was in someone wrote a script to do just what I'm asking for, but the script did specify extensions. I'm trying to get away from that. I will however do some of the pruning mentioned in your answer. – Chris F Sep 12 '19 at 14:50
  • @ChrisF Would https://stackoverflow.com/questions/30689384/find-all-binary-files-in-git-head help? – VonC Sep 12 '19 at 14:52