1

In our company, we have some binary files (don't ask why) which are changed quite often and pushed to git.

The whole .git repo weighs about 900Gb from which 90% belong to those files. Is there a way to truncate the git history only for certain files so that the actual .git repo becomes smaller? I found how to do for a specific branch, but not for a specific file.

Eugen
  • 2,934
  • 2
  • 26
  • 47
  • related, but not a direct answer : look into `git-lfs` to store these blobs outside of git and only track a reference in git. – LeGEC May 26 '21 at 22:05
  • You have to copy the bad commits (that have huge files in them) to new-and-improved commits (that don't), or copy the bad commits (that have history behind them that keeps huge files in those historical commits) to new-and-improved commits (that don't have previous commits). Having copied these commits to new-and-improved commits, you must now copy all *subsequent* commits as well. The result is a new repository, often completely unrelated to the old repository. You must now have all users switch to the new repository, throwing away their old clones. – torek May 27 '21 at 01:25
  • For *commands that can do this conveniently* search for "remove large files from Git repository". – torek May 27 '21 at 01:25
  • unfortunately, we cannot switch to a new repo, all has to be done in the same repo. And the goal is to keep at least 10 commits of those binary files. I cannot remove them entirely, they are needed. – Eugen May 27 '21 at 02:12
  • If you can't afford to change the history (i.e. force all developers to re-sync), then there's no way to do this. *Every* commit where the file is present will need to be modified, not just commits that *modify* the file. So your best bet is to use a shallow repo, but using shallow repos to clone from is not incredibly well supported, so it comes with its own set of problems. – Joachim Sauer May 27 '21 at 09:09

1 Answers1

0

In our company, we have some binary files (don't ask why) which are changed quite often and pushed to git.

That is why there are such referential as Nexus or Artifactory, for storing artifacts.

But any solution, like a git filter-repo with a commit-callback (I suggested here) to determine on each commit if you need to delete the file or not, would change the history of that repository.

That would, in effect, force the developer to reset regularly their own clone of that repository, in order to match the new modified history, after each "cleanup" action.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250