1

I'm using GIT GUI to keep track of changes made to our daily developing tasks and code/files changes.

Currently I'm into web development, primarily web sites. I'm using a GIT archive for each website folder, which contains all the files and documents related to the project.

Often, GIT is complaining that there are many items in the archive, and to keep the archive fast and optimized it's suggesting me to compress the archive.

Is it safe to do so? Will the advantages of compressing the archive surpass eventual problems the compression may cause (is it even worth it)?

I'm especially worried about potential archive corruption or known issues/bugs I may not be aware of.

Jose Faeti
  • 12,126
  • 5
  • 38
  • 52

3 Answers3

3

The Git repository format is robust and very well tested. It is safe to do the repository compression.

Having said that, backups are always a good idea.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
2

If you are talking about git gc then it is perfectly safe and no data is lost.

Git runs this itself periodically, but it does not do anything until a repo gets over a certain size.

As Greg says, always have a backup of your repo.

git gc --prune is another matter. This will remove all unreferenced objects from the repo, which might not be what you want (you may want to recover one of these later).

Richard Hulse
  • 10,383
  • 2
  • 33
  • 37
0

TL;DR: Yes it is safe to perform git repository optimization, but do make backups and test them.

I guess that by "compression" you mean git gc.

The operation is as safe as it can be given the environment (machine stability, RAM and storage reliability).

Nevertheless, there is one weakness in all computing machines: storage space. Be aware that git gc can sometimes (paradoxically) temporarily increase the size of the repository (due to unpacking of objects that are candidate for removal but not removed yet). If the machine is low on storage space this can prevent the operation to succeed, or hinders consecutive work. Also, git gc can require huge amount s of memory (e.g. bigger than the on-disk repository size) and fails if the system can't cope.

That said, I never saw repository corruption seemingly caused by a git gc.

If your backup is a clone repository, be careful: some items (branches, lightweight tags, regular tags, configuration, hooks, etc.) are not automatically transferred between repositories, some are partially or in some cases only, with complicated rules.

Since you're worried about data safety, the best bet to be safe (and that's general, not specific to git) is to give yourself a regular backup + crash recovery process. Then, from time to time give yourself an isolated test recovery environment (it can be as simple as a folder on another computer, or a virtual machine, depending on the context). Then in that environment fully run your recovery procedure and check that your precious data and processes are made fully functional again, from the backup, without needing your main storage. That way, you know that if the main storage crashes you're still safe.

Community
  • 1
  • 1
Stéphane Gourichon
  • 6,493
  • 4
  • 37
  • 48