2

I have a GitLab repository that I'm using for a daily backup. In the beginning, the size of the repo is 1GB. After some days the repository reaches 12GB. I suppose GitLab is stocking the old versions. It is saturating my memory as I use my own server to host my GitLab repos. Is there any way to make GitLab only store 5 latest versions (5 days) of my backup repo?

Lee Wyi
  • 105
  • 7

2 Answers2

3

As some other commenters have already noted, Git is not really meant to backup files in this way. Especially since it sounds like you're backing up large binary files. Every time you change a large binary file in Git, Git has to store a complete new copy of the file. With text-based file types Git can store deltas and is much more efficient.

You could consider using Git LFS, but again here it may not make a difference if you're adding a new copy of a binary every time you commit. If that's the case, then you are probably better off using some sort of cloud storage service rather than a version control system.

Drew Blessing
  • 2,595
  • 11
  • 16
  • Thanks. I am using text files. These text files contain mysql queries. The size of the commit stays more or less the same as it is the same file (but with modified queries) that get's pushed every day. GitLab stores the history and this history piles up with time thus consuming a lot of memory.. – Lee Wyi Aug 05 '19 at 08:36
  • You are saying one file takes up 12GB @LeeWyi? – Hogan Aug 06 '19 at 15:55
  • Hogan has a valid point. If you are truly storing text files, then Git is efficiently storing deltas and it shouldn't compound each time you commit. So either your text files are outrageously huge, or they're quite large and you're changing huge portions with each commit. – Drew Blessing Aug 07 '19 at 16:06
  • Yes. They are text files (around 10MB each) containing data from MySQL dump. Sometimes only a few files are changed, sometimes around 90% of the files are changed – Lee Wyi Aug 09 '19 at 11:53
1

You could just take the code at a point in time and add to a new repo -- then archive the old one in any way you do. Also consider that git is much worse at compressing binary data -- if you have a lot of compiled versions you are saving in your repo that is probably why it is getting so big. It may be the case if you stop tracking binary in your git it will become much more manageable.

Hogan
  • 69,564
  • 10
  • 76
  • 117
  • Thanks. I want to automate this process as I have hundreds of repos. How can you stop tracking ? – Lee Wyi Aug 05 '19 at 08:37
  • @LeeWyi -- stoping tracking is easy to do in git -- you just add to the ignore list. – Hogan Aug 05 '19 at 15:51
  • adding the files to the ignore list will stop them from uploading to the Git. I the most recent files (> 10 days) – Lee Wyi Aug 06 '19 at 15:06