What does GIT download?

Question

If I clone a repository with 3 branches. Is GIT smart enough to only download the changes between the branches or does it download all files repeatedly for all branches? And also, does it download all data from all branches at the beginning or waits till I switch the branch?

Can you be a bit more precise in your question? As it stands now, the question requires more or less a full Git tutorial. How familiar are you with Git's object model? Do you know what a blob, tree, commit, tag, annotated tag, signed tag, note, ref is? Do you know how they are represented on-disk? Do you know what a packfile is? How familiar are you with the network protocol (which is based on packfiles)? Do you know what `git fetch` is? Do you know what `git checkout` is? — Jörg W Mittag, Sep 07 '18 at 11:36
I don't ask for a tutorial about git. Just how git handles the initial download of the branches. My focus is on how much bandwidth it will use and if I can safe a lot of bandwidth by only fetch one branch (`--single-branch`). — jwillmer, Sep 07 '18 at 11:45
In most situations, bandwidth should not be a concern. But if you have a huge repo (lots of versions of large files that don't diff well, and you haven't used a tool like lfs to mitigate this), or a limited network connection, such that it *does* become an issue, then `--single-branch` and or shallow cloning can save considerable bandwidth at the expense of certain operations being less well-informed. (Note that git operates locally for almost everything it does.) — Mark Adelsberger, Sep 07 '18 at 13:00

score 2 · Answer 1 · answered Sep 07 '18 at 12:58

The currently-accepted answer is rather misleading, and a lot of what it says doesn't really address the question.

Only the packed representation[1] is used for transfer between repos, so you can generally assume that you'll receive a (reasonably) minimal representation of the information you requested.

To say that it "only downloads the commits" is misleading for several reasons. Mostly it promotes the misconception that commits, themselves, are lists of changes - which they are not. Commits are snapshots of the project[2]. "Downloads the commits" means, roughly, "downloads everything".

Which is a nice segue to...

By default git clone downloads the entire history of all branches. You can give it options to tell it to download less, if you know you need less, but the default is to download everything so that you can later perform any source control operation (other than syncing changes with another repo) without any required connectivity. For details of the options, see the git clone docs (https://git-scm.com/docs/git-clone) - especially --single-branch, --depth, and --shallow-* options.

None of this really has much of anything to do with the DAG representation. That's really only important when thinking about how to navigate the objects in git, and in fact it's a mistake to think of deltas as following the DAG, since you'd generally get it backwards.

[1] There are two formats in which git stores the objects that make up a project history. As new material is committed, it is stored in loose objects - complete copies of each version of each file - but even then, git never stores the same content twice. So if a file is unchanged across 10 commits, then one copy of that file is stored. Also, even in loose form the data is compressed.

Later, objects can be switched to a "packed" representation. Among the optimizations done when packing is to find similar objects, and represent the older of the two as a delta from the newer.

[2] Some commands, like rebase, operate on the patch between a commit and its parent, and the documentation (like much git documentation) is a bit squishy with the terminology around those commands. So unfortunately it's easy for the misconception that a commit is a list of changes to spread.

But even though some of the objects making up an older commit might be internally represented as deltas from other objects from newer commits, conceptually the commit is a snapshot. If you tell git, e.g. with --depth options, to only download part certain commits, you'll still get the entire snapshot - not just the patches relative to previous commits. Any subset of a repo that contains partial deltas without enough information to rebuild the snapshot (i.e. the commit) would be considered corrupt.

LeGEC · Answer 2 · 2018-09-07T12:14:56.127

git clone will download enough content to allow you to reach any commit in the history of any branch.

If you want to download only the top commit of each branch, or only a portion of the history of each branch (say : the last 10 commits), look at the following options :

--depth=x
--shallow-since=date
--shallow-exclude=revision

These options can also be passed to git fetch or git pull.

Note, however, that git is geared towards transmitting all of the content efficiently :

if a file is not modified (between two commits) it is only downloaded once
all downloaded content is compressed (using zlib)
it has a bunch of features to detect that 2 files are very similar and download only the diff (instead of twice the whole content)

clamentjohn · Accepted Answer · 2018-09-07T11:58:45.310

Is GIT smart enough to only download the changes between the branches or does it download all files repeatedly for all branches?

The best thing about git (from all the ones which came before git) is that is uses a DAG to keep track of changes. That is when you git pull or git fetch it only downloads the diffs.
So to answer your question: It only downloads the 3 commits you've made. Then just make a local DAG for you. Have a look here for a quick git basics.

And also, does it download all data from all branches at the beginning or waits till I switch the branch?

When you do a git clone you download the whole repository, this is done to build the DAG. Then later on it only downloads the ones you ask it to, using git fetch and git pull.

git fetch downloads the changes from remote and store it in .git/refs/remotes/<remote>/. So you won't directly bring it into your working directory. (Have a read on git baiscs if you don't know what a working directory is).
git pull does git fetch and git merge in one command.

Refer to this other SO question on fetch vs pull .

A few reading materials
Git looks like a tree more than a graph. SO question
A simple intro to git with explaining how and why we use a DAG Link

What does GIT download?

3 Answers3