0

Suppose there is a single character change in 1TB file. During push will it transfer the entire file over the network or just the difference patch? From my limited understanding transferring the patch seems enough.

tejasvi88
  • 635
  • 7
  • 15
  • Interesting question. I don't know how pushing/fetching changes is implemented exactly. The smallest "atom" in Git would be a blob. But running `git push` will usually output how much data it is sending over the wire: »Writing objects: 100% (347/347), 48.18 KiB | 8.42 MiB/s, done.« – knittl Feb 03 '21 at 10:05
  • Is the file binary or text? – evolutionxbox Feb 03 '21 at 10:14
  • @evolutionxbox I am concerned with binary but plain text would be interesting to know as well. – tejasvi88 Feb 03 '21 at 10:27
  • Binary would probably make git push 1TB every time as it can’t compress binary files. Text changes depends on how compressed the text is. – evolutionxbox Feb 03 '21 at 10:28
  • 1
    As long as you're using a protocol that uses pack files, and everything else goes right, Git will send the file as a packed object that is described by a few hundred bytes overall. Many things could go wrong here though. See my detailed answer to this [related question](https://stackoverflow.com/q/62650799/1256452). – torek Feb 03 '21 at 10:30
  • @evolutionxbox Why would it require compression for the entire file. Why can't it send the diff? – tejasvi88 Feb 03 '21 at 10:35
  • @tejasvi88 are you aware what commits are? (not being rude) – evolutionxbox Feb 03 '21 at 10:39
  • @torek This question seems to be duplicate of that. I don't know if I should close this. – tejasvi88 Feb 03 '21 at 10:45
  • @evolutionxbox My understanding is commits are state of repository though internally git stores them as diffs. – tejasvi88 Feb 03 '21 at 10:46
  • Maybe @torek can correct me, but I don't think commits are internally stored as diffs. – evolutionxbox Feb 03 '21 at 10:47
  • 2
    @evolutionxbox: they're not: they're stored as objects. Objects are stored either loose or packed; packed objects use delta compression, but not diffs (at least not *Git* style diffs, which are only for text; the delta compressor tackles binary files too). – torek Feb 03 '21 at 10:48
  • 1
    @evolutionxbox By diffs I mean it uses delta compression to get file from existing one. – tejasvi88 Feb 03 '21 at 10:50

1 Answers1

3

Teaching to self-answer :) Let's play a game run a testcase:

$ git version 2.30.0
$ git init one
$ cd one
$ for i in {1..30000000}; do printf '%s\n' "$i"; done >> bigfile # create a file ~250MB
$ git add bigfile
$ git commit -m 'that is one big file'
$ cd ..
$ git clone one two
$ cd one
$ sed -i '10s/.*/updated/' bigfile # cange the tenth line in the file
$ git add bigfile
$ git commit -m 'canged one line'
$ # in a different shell:
  $ cd two
  $ git daemon --verbose --reuseaddr --export-all --base-path=. --enable=receive-pack ./.git
$ # in the original shell, i.e. repository "one":
$ git push git://127.0.0.1/.git HEAD
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 6.07 KiB | 6.07 MiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0

Luckily, only 6 kilobytes are transferred for a single changed line in a 250MB text file. Note however, that Git stores the file content compressed: du -sh .git/objects prints 65M after the initial file has been added. 129M after the change was committed and 63M after git gc --prune has been run.

The important bit in the output is "Delta compression": it means that the delta (=diff) between commits is calculated.

knittl
  • 246,190
  • 53
  • 318
  • 364