3

I am trying to implement a new backup system on a website using git. The virtual private server has 20GB space total with 5GB free.

When I run git add . at the /var/www (with my favorite .gitignore parameters) I have a gigantic git folder that fills my hard drive to capacity.

It is not immediately apparent as to why this is happening as I expect the .git directory to contain the bits about the bits (meta information) and not binary duplicates of all my files!

Whats going on here? If my website is 14GB will the .git directory occupy an additional 14 gb?

Mikhail
  • 7,749
  • 11
  • 62
  • 136
  • there's a whole free book on the subject of GIT.... http://git-scm.com/book/ you should probably skim through it, and what GIT does will become clear. Then there's another book called git in the trenches, which is really good for explaining where and how you use git and its features. http://cbx33.github.com/gitt/ – Julian Higginson Jul 23 '12 at 01:10
  • perhaps related: [Managing large binary files with git](http://stackoverflow.com/questions/540535/managing-large-binary-files-with-git) – sehe Jul 23 '12 at 02:05

4 Answers4

5

Space Used Equals GIT_DIR + GIT_WORK_TREE

If my website is 14GB will the .git directory occupy an additional 14 gb?

To oversimplify the case enormously, yes. In a non-bare respository, Git stores all tracked file blobs, as well as other repository objects such as trees and commits under GIT_DIR. It also maintains copies in the GIT_WORK_TREE.

The repository uses packfiles and deltification to keep this state of affairs from getting out of hand in the normal use case, but if you have 14GB+ of data in a non-bare respository--especially if a lot of those files are binary assets--then you may very well double-up (or worse) on disk usage.

Todd A. Jacobs
  • 81,402
  • 15
  • 141
  • 199
1

A git repo contains the entire history of the files. The .git folder will contain all of the bits that are in your working directory, so you can expect it to increase the size. It won't be double due to compression, but it will be significant. And as you change the files, the total size of the repo will increase, even if the size of the working tree doesn't, because the history is stored.

Ned Batchelder
  • 364,293
  • 75
  • 561
  • 662
1

the .git folder will contain a complete history of every file in the repository and every change made to these files.

it may not be an extra 14GB because it has rather good compression but it will be close.

kca
  • 121
  • 4
1

Any version control system needs to have a copy of the bits somewhere for when you change the files it has to know what they were.

Most VCS does poorly with large binary files. I don't suppose that 14GB is all written by humans is expected and to change. Photographs usually make poor candidates for VCS; databases make even worse candidates. git is designed to manage text written by humans, so are all of its close cousins.

msw
  • 42,753
  • 9
  • 87
  • 112