2

Whenever I compress my Git repository with 7-Zip, it is many times larger than uncompressed. Specifically, I cloned the HTML5Boilerplate repository, originally 243 KB, and compressed it with 7-Zip, using several methods. With 7z, LZMA compression, highest dictionary size, word size, etc., it becomes over 12 MB when compressed, which is almost 50 times larger! With other methods, such as zip, it is even larger!

The compressed archive consistently passes 7-Zip's "Test Archive" test, when I view the archive with 7zGUI or whatever it's called, they are legible, no corruption or anything, and when I uncompress the archive the files seem to be preserved, including "hidden" files & folders such as .git and .htaccess, and the uncompressed folder is the exact same size as before compressed. This all indicates to me that the issue is not corruption of files, adding random bits to files, or anything like that.

What could possibly cause this to happen?

trysis
  • 8,086
  • 17
  • 51
  • 80

2 Answers2

2

You've checked something wrong. 243 KB is size of checkout copy not including .git directory. Here is fresh checkout:

$ du --apparent-size -hcs *
1.3K    404.html
8.7K    CHANGELOG.md
5.6K    CONTRIBUTING.md
1.1K    LICENSE.md
2.6K    README.md
1.1K    apple-touch-icon-precomposed.png
416     browserconfig.xml
603     crossdomain.xml
17K     css
49K     doc
766     favicon.ico
206     humans.txt
4.0K    img
1.8K    index.html
118K    js
78      robots.txt
6.7K    tile-wide.png
14K     tile.png
232K    total

$ du --apparent-size -hcs .
13M     .
13M     total
Alexey Ten
  • 13,794
  • 6
  • 44
  • 54
  • 1
    You are correct, Windows Explorer does not seem to count dotfiles/folders in sizes even when you configure it to show them. I will research how to show their sizes. Thank you, that seems to be the issue. – trysis Apr 21 '14 at 16:24
  • Figured out the problem. Originally, being a "dotfolder" (folder starting with a dot), the `.git` folder was hidden, and therefore Windows Explorer didn't include its size. When I unhid it, Explorer included its size. – trysis Apr 21 '14 at 16:46
1

The blobs in the repository are already compressed with zlib so double compression only increases the size

user3159253
  • 16,836
  • 3
  • 30
  • 56
  • What's compressed with zlib? The entire history of the repo?! – trysis Apr 21 '14 at 16:01
  • Sounds strange? But it is. Any (fully functional*) git repository contains entire history and objects inside the git repository are packed. – user3159253 Apr 21 '14 at 16:06
  • * there could be "shallow copies" of a repository which contain only a subset of history, but these repositories can only be used to view revisions and not able to make new commits. `git clone --help` for more information. – user3159253 Apr 21 '14 at 16:08
  • Wow, zlib must be better than any other compression algorithm if it can compress to 50 times smaller than 7z & gz. Why don't we use zlib instead of those other ones? – trysis Apr 21 '14 at 16:09
  • And yes, it sounds very strange to have the entire history of a repo be so much smaller than the current version. In the case of HTML5Boilerplate, the `.git` folder (which I assume is where the history is?) is 12 MB, while the rest of the repo is 243 MB. – trysis Apr 21 '14 at 16:11
  • Because it's not only zlib. See http://stackoverflow.com/questions/9478023/is-the-git-binary-diff-algorithm-delta-storage-standardized – user3159253 Apr 21 '14 at 16:12
  • As @alexeyton indicated in his answer, zlib doesn't compress that much after all. The repo seems about 12-13 MB without compression. – trysis Apr 21 '14 at 16:26
  • No, compressing already compressed data with zlib or 7z cannot increase the size that much. zlib, for example, will only increase the size of a large input by about 0.03%. – Mark Adler Apr 21 '14 at 16:33