42

After reading the documentation, I still don't really understand what the differences are between --shared and --reference <repo>. They seem so similar.

  1. What are the differences between the --shared and --reference <repo> options?

  2. Can they be used to save drive space when making multiple local clones of another local clone?

  3. Can each local clone have a different branch checked-out?

Note: I'm aware that I can use multiple shallow clones with truncated history by using git clone --depth <depth>, but each clone still has to duplicate at least some history in order to do that, so I was thinking that maybe it's not the most optimal way to save drive space (though it is better than nothing).

Background

Sometimes I like to have more than one checkout of my working copy in a repository, so I create multiple clones, where each clone has its own checkout.

However, I don't really need the whole history with each clone, just the most up-to-date versions of my branches, so I could possibly save a lot of drive space by having each clone use the tag, commit, tree, and blob objects from the original local clone (for example, via symlinks for something).

git clone documentation

I checked the git clone documentation to see if there's anything I can use.

--shared

I saw that there's a --shared option:

When the repository to clone is on the local machine, instead of using hard links, automatically setup .git/objects/info/alternates to share the objects with the source repository. The resulting repository starts out without any object of its own.

This looks like it might be useful for helping me to save drive space with multiple clones that have different checkouts, since each clone shares objects with the original local clone.

--reference <repository>

Then I also saw the --reference <repository> option:

If the reference repository is on the local machine, automatically setup .git/objects/info/alternates to obtain objects from the reference repository. Using an already existing repository as an alternate will require fewer objects to be copied from the repository being cloned, reducing network and local storage costs.

NOTE: see the NOTE for the --shared option.

This says that it will reduce local storage costs, so this might be useful as well.

  • @user3348022 cool, I saw [this from Google search](http://lists-archives.com/git/505518-clarify-git-clone-local-shared-reference.html), but I couldn't figure out how to navigate the archaic interface in order to find the first couple of replies. If you want to summarize the relevant parts of that and add it as answer, you might earn some EPIC upvotes and repz! `:D` –  Apr 25 '14 at 23:01
  • @user3348022 also, that email you linked still doesn't clarify enough to me about why I would want to use `--shared` vs `--reference`. Is the only difference that when you use `--shared`, the origin is the local repo being cloned, while with `--reference`, the origin is the remote repo being cloned? –  Apr 25 '14 at 23:28
  • @John - I'm a little confused. The answers you linked to say that --reference implies --shared. But DoubleWord's answer says that --shared does not copy the objects, whereas --reference does copy the objects. That hardly seems like an "implies" relationship. That also makes --reference seem much less dangerous than --shared. Can you clarify at all? – Sean May 08 '14 at 17:26
  • 1
    the 'lists-archives.com' link led me to clickbait, this narkive link was still good: https://git.vger.kernel.narkive.com/TxZNFARz/clarify-clone-local-shared-reference, I recommend reading as there's some interesting interactions when using these options – qneill Mar 25 '22 at 16:20

3 Answers3

15

Both options update .git/objects/info/alternates to point to the source repository, which could be dangerous hence the warning note is present on both options in documentation.

The --shared option does not copy the objects into the clone. This is the main difference.

The --reference uses an additional repository parameter. Using --reference still copies the objects into destination during the clone, however you are specifying objects be copied from an existing source when they are already available in the reference repository. This can reduce network time and IO from the source repository by passing the path to a repository on a faster/local device using --reference

See for yourself

Create a --shared clone and a --reference clone. Count the objects in each using git count-objects -v. You'll notice the shared clone has no objects, and the reference clone has the same number of objects as the source. Further, notice the size difference of each in your file system. If you were to move the source, and test git log in both shared and reference repositories, the log is unavailable in the shared clone, but works fine in the reference clone.

RjOllos
  • 2,900
  • 1
  • 19
  • 29
DoubleWord
  • 159
  • 1
  • 4
  • So what's the difference between using `--reference` and using a normal `git clone` – Yahya Uddin Sep 12 '16 at 03:08
  • 1
    @YahyaUddin: the difference occurs when cloning a *non-local* repository. Without `--reference` this downloads all the objects over the network. With `--reference` this gets the object IDs over the network, then checks to see if those object IDs are available locally. If they *are* available locally it skips the download step. If they are *not* available locally it downloads as usual. The end result is that `git clone --reference` is *faster* (how much faster depends on how up-to-date the local reference is). (But see also `--dissociate`.) – torek Sep 12 '16 at 12:37
  • "--shared" can really only be used when cloning a local repo ; whereas "--reference" is mostly useful when cloning a distant over-the-network repo for which we already have some objects available in a local repo. The usecases are thus different, and I think this answer is misleading – YoungFrog May 25 '22 at 05:33
8

The link in the comments to your question is really a clearer answer: --reference implies --shared. The point of --reference is to optimise network I/O during the initial clone of a remote repository.

Contrary to the answer above, I find that the --shared and --reference repositories -- from the same source -- have the same size and both have zero objects. Of course, if you use --reference for some other repository which is based off a common source, the size and objects will reflect the difference between the repositories. Note that in both cases we are not saving space in the work tree, only the .git/objects.

There is some nuance to maintaining this setup going forward - read the thread for more details. Essentially it sounds like the two should be treated as public repositories, with care around history re-writing in the presence of repacking/pruning/garbage collection.

The workflow around maintaining an optimal disk-space usage after the initial clone seems to be:

  1. pull source
  2. repack source
  3. pull secondary
  4. git gc in secondary

Probably best to read the discussion in that thread though.

You can add an alternate to an existing repository by putting the absolute path to the source's objects directory into secondary/.git/objects/info/alternates and running git gc (many people use git repack -a -d -l, which is done by git gc).

You can remove an alternate by running git repack -a -d (no -l) in the secondary and then removing the line from the alternates file. As described in the thread, it is possible to have more than one alternate.

I've not used this much myself, so I don't know how error-prone it is to manage.

Sam Brightman
  • 2,831
  • 4
  • 36
  • 38
3

The link in the comments to your question is now dead.

https://www.oreilly.com/library/view/git-pocket-guide/9781449327507/ch06.html has some great information on the subject. Here is some of what is there:

first, we make a bare clone of the remote repository, to be shared locally as a reference repository (hence named “refrep”):
$ git clone --bare http://foo/bar.git refrep

Then, we clone the remote again, but this time giving refrep as a reference:
$ git clone --reference refrep http://foo/bar.git

The key difference between this and the --shared option is that you are still tracking the remote repository, not the refrep clone. When you pull, you still contact http://foo/, but you don’t need to wait for it to send any objects that are already stored locally in refrep; when you push, you are updating the branches and other refs of the foo repository directly.

Of course, as soon as you and others start pushing new commits, the reference repository will become out of date, and you’ll start to lose some of the benefit. Periodically, you can run git fetch --all in refrep to pull in any new objects. A single reference repository can be a cache for the objects of any number of others; just add them as remotes in the reference:

$ git remote add zeus http://olympus/zeus.git
$ git fetch --all zeus

Paul Van Camp
  • 261
  • 3
  • 6