9

I'm new to Git. Today I pulled a branch via git terminal and got the following message:

remote: Counting objects: 5, done.

remote: Compressing objects: 100% (3/3), done.

remote: Total 3 (delta 2), reused 0 (delta 0)

Unpacking objects: 100% (3/3), done.

What does delta mean?

Community
  • 1
  • 1
user1941537
  • 6,097
  • 14
  • 52
  • 99

3 Answers3

15

Ok, so first we need need understand how Git stores data within repository. The most important thing there is that it always store whole files, in other words on conceptual level, Git store "exact copy" of the project tree in the each commit.

Ok, but how it happens that each commit do not increase repository size by new copy of whole tree you may ask. This is where magic happens. First lets see that we have 2 files currently in tree and committed

a.txt
b.txt

When we change b.txt but leave a.txt as is, we do not need to store whole new copy of a.txt, just point to the old one (as it hash hasn't changed).

But let's go one step further, we do not need to store whole b.txt file either, just part that has changed. So let's split b.txt into chunks of known size and make b.txt node just list of these chunks. In that way we can store repeated chunks only once, and save space. And each of these "chunks" is called delta.

Hauleth
  • 22,873
  • 4
  • 61
  • 112
6

Git uses Delta encoding which indicates a way of storing or transmitting data in the form of differences (deltas) between sequential data rather than complete files.

An object in a pack is stored as a delta i.e. a sequence of changes to make to some other object.

You can read more here for better understanding .

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
naib khan
  • 928
  • 9
  • 16
  • 1
    This is very confusing, as ["deltas" aren't diffs](https://gist.github.com/matthewmccullough/2695758). Git do not store diffs. – Hauleth Sep 18 '19 at 12:20
  • 1
    @Hauleth: it's true that Git doesn't store (text) diffs, but delta encoding *is* a form of differential encoding. "Diffs" and "differences" are not the same. – torek Sep 18 '19 at 19:45
  • Still, using that wording can be very confusing for the newcomers. – Hauleth Sep 18 '19 at 21:17
2

Related source code

if (progress)
    fprintf_ln(stderr,
           _("Total %"PRIu32" (delta %"PRIu32"),"
             " reused %"PRIu32" (delta %"PRIu32")"),
           written, written_delta, reused, reused_delta);

Doc: pack-format

The delta data is a sequence of instructions to reconstruct an object from the base object. If the base object is deltified, it must be converted to canonical form first. Each instruction appends more and more data to the target object until it's complete. There are two supported instructions so far: one for copy a byte range from the source object and one for inserting new data embedded in the instruction itself.

Git Internals - Packfiles

The initial format in which Git saves objects on disk is called a “loose” object format. However, occasionally Git packs up several of these objects into a single binary file called a “packfile” in order to save space and be more efficient. Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server. To see what happens, you can manually ask Git to pack up the objects by calling the git gc command:

$ git gc
Counting objects: 18, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (14/14), done.
Writing objects: 100% (18/18), done.
Total 18 (delta 3), reused 0 (delta 0)
LF00
  • 27,015
  • 29
  • 156
  • 295