1

Assumming the convention "origin", "master".
My local repository has a "master" branch.

Is "origin/master" a mirror or a reference?

Are there two types of remote repository in git?
When I do a git fetch, the "origin/master" is updated.
When I do a git pull, the "origin/master" and the local "master" are updated.

I read that git fetch update the remote-tracking branches under refs/remotes//
But its unclear for me.

Actually, I have two remote repositories for my local repo?
The first one is for example in GitHub, and the other one is my "origin"?

Is "origin" only a reference or is a real repository (a mirror)?

When I do a git fetch, I'm downloading latest changes from remote repository to my local repository, but not for "master", then for where?

In this picture, there are 3 repositories (Local, Remote/Origin and Remote)
Is it true?
enter image description here

danilo
  • 7,680
  • 7
  • 43
  • 46
  • 3
    The linked question—[What is "origin" in Git](https://stackoverflow.com/q/9529497/1256452)—covers most of this, but it's worth adding that Git terminology here is terrible and confusing. `origin` is a *remote*, which is a short name for a URL, but once you have the short name in hand, Git starts using it for a few more things too. Meanwhile Git has things it calls *branch names* and very different things that it calls *remote-tracking branch names*. I think you may be better off by starting with the concept of *commits* here. Fetch and push deal in commits, in part by using names. – torek Jan 01 '18 at 17:58
  • Then I have 2 copies of the changes. One in my GitHub, and another in my "origin/master"? – danilo Jan 01 '18 at 18:03
  • Sort of. Commits don't hold *changes*, commits hold *snapshots*. The actual name of any individual commit is its raw hash ID, which you'll see when you run `git log`. Each commit stores the ID of a previous, or *parent* commit. Any given *repository* holds a set of commits. With `git fetch` or `git push`, you connect your Git (your repo) to some other Git (some other repo). They converse, figure out which commits you have that they don't (`git push`) or which they have that you don't (`git fetch`), and then copy those commits across. [continued] – torek Jan 01 '18 at 18:06
  • Once you both have the same commits-by-raw-hash-IDs, you either ask them to set one of *their* names, such as `master`, to point to one specific commit hash ID; or they have you set one of *your* remote-tracking names, such as `origin/master`, to point to one specific commit hash ID. So the two repositories (yours, and the copy on GitHub) eventually both have all the commits. Your names—your branch names, and your remote-tracking names—remember one commit hash ID each. Their names also remember one hash ID each. [cont'd] – torek Jan 01 '18 at 18:08
  • When you run, e.g., `git log` to *view* commits, Git starts from one specific commit hash ID and shows you that commit. That commit stores another commit's hash ID—its *parent* —so `git log` then goes on to show you *that* commit, by its hash ID. That commit has another parent, which Git log goes on to show you, and so on. In effect, Git works *backwards* from these named IDs to all the previous commits. – torek Jan 01 '18 at 18:09
  • To make this all fit into your head (to make sense), it helps to get a big whiteboard or sheet of paper and start drawing on it. Draw commits as circles with arrows coming out of them. The arrows point *backwards*, to parent commits. Add a new commit by drawing a new circle and pointing backwards to the commit you had checked out when you ran `git commit`. Branch names (regular branches, not the remote-tracking things) always point to the *last* commit you added to that branch: Git manages this by writing the new hash ID into the branch name, every time you successfully `git commit`. – torek Jan 01 '18 at 18:15
  • 1
    The remote-tracking names also point to the *last* commit added, but not to one you made with `git commit`. Instead, they point to the last commit that was on the *other* Git, in its branch, when you ran `git fetch`. So `origin/master` remembers what *their* Git had in *their* `master` when you ran `git fetch`. (This is the part that makes one's head ache :-) ) – torek Jan 01 '18 at 18:17
  • but is `origin/master` a mirror? or only a reference? – danilo Jan 01 '18 at 18:21
  • great ideia, I added a picture. Thanks very very much for your help, I'm trying to absorb all concepts that you clarified. – danilo Jan 01 '18 at 18:36
  • I think that finally I understand, after `git fetch` then `origin/master` has all "new versions" of files downloaded from GitHub? – danilo Jan 01 '18 at 18:43
  • 1
    `origin/master`—its full name is actually `refs/heads/origin/master`—simply holds the hash ID of the commit that your Git is remembering from the `master` on the other Git (in this case, at GitHub). That *commit* has what they think of as their latest version. If you run `git fetch` again and, for some reason, they used `git reset` to *remove* the last commit—to back up to its parent—your Git will already have the *commit*, so your Git will just change your `origin/master` to record the parent commit's hash. – torek Jan 01 '18 at 19:07
  • I updated the picture to better reflect how git works – danilo Jan 01 '18 at 20:52
  • That image is really misleading and incorrect. – evolutionxbox Jan 01 '18 at 21:05
  • @evolutionxbox, Is this new image correct? – danilo Jan 06 '18 at 17:31

2 Answers2

2

Assumming the convention "origin", "master". My local repository is my "master".

Not really.

origin is the shorthand-name of your local view of the remote repo.
The name of your local repository (a.k.a. .git directory) is . (dot).
master is a name of a branch that may exists on your local, on remote, or both of them. Since it's a default name of the first branch, it usually exists everywhere, but that's not a must.

Usually, when you clone a remote repo, origin is automatically set to point to that source remote repository. However, you may change that name to anything, you may also add new names indicating different repositories. That's often called "working with multiple remotes". If you are not interested and if you work on defaults, then you have one single remote target, it's friendly name is origin, and it's the default target for fetch/pull/etc operations.

Are there two types of remote repository in git?

Yes, but not in the sense of what you meant. By "type" in "remote repository type" I may mean a "type of access" (http/s, ssh, ..), or a "type of repository" (bare, non-bare). That gives many combinations of remote repos, but it's mostly irrelevant and totally not what you thought. In terms of what you thought about, there are no 'repository types' except for the one local repo (.git dir) and remotes (any other repo, be it on-company-server, on-github, on-hdd-in-different-directory, etc).

I know too, that git fetch update the remote-tracking branches under refs/remotes// but its unclear for me.

Remote-tracking branches is the local view of remote repository are mentioned earlier.

Your local repo has branches, i.e. 'master'
Remote repo called 'origin' that sits on GitHub can have its own 'master', that may be in a different version than yours. Your local repo may have a special automatic branch called origin/master that remembers the last-seen state of 'master' on remote GitHub repo.

When I do a git fetch, the "origin" is updated.

"updating origin" is highly imprecise and may even be understood as sending commits to remote repository, and that's 'push'...

When you do fetch, git contacts a repository (not necessarily remote!) and reads state of branch(es) and updates your local branches with that information.

For example:

# contact 'origin'
# read state of 'master' on `origin` *)
# update nothing on your local branches
git fetch origin master

# contact 'origin'
# read state of 'master' on `origin` *)
# update your local 'blaster' to match current remote master
git fetch origin master:blaster

# contact '.' (your THIS LOCAL repo)
# read state of 'blaster' on `.`
# update your local 'master' to match current local blaster
git fetch . blaster:master

*) whenever any non-local repo is contacted and some new state is learned, git may remember that new state in form of a tracking branch like origin/master. Note that's not the same as origin master or master. Usually, master is ambiguous or implicitly local, origin master means "master on the remote side" (with imminent connecting there to check the most recent state), and origin/master means last-known-state of the "master on the remote side" (without connecting, just using what we already knew from previous checking).

When I do a git push, the "origin" and the "master" are updated.

Similarly to previous statement, that's somewhat imprecise.

When you do a push, state of the branches is read from your local repo, and is sent to target remote repository, and branches there are updated with that information. Ona very basic terms, it can be considered a direct opposite of fetch. Fetch reads from something, updates local. Push reads local, updates the something.

The only things that are updated as a result of push is remote side you pointed to be updated. Well, and also the tracking branches that track that remote side - to remember that it was updated.

For example:

# read state of 'master' on local
# contact 'origin'
# create/update the-target-branch**) on 'origin' to match local master
git push origin master

# read state of 'master' on local
# contact 'origin'
# create/update branch called 'blaster' on `origin` to match local master
# update tracking branch 'origin/blaster' to remember its new state
git push origin master:blaster

# read empty state
# contact 'origin'
# update branch called 'blaster' on `origin` to match .. empty
git push origin :blaster
# ^ this actually DELETES the branch 'blaster' from remote origin
# ^ and also deletes local tracking branches 'origin/blaster' if it existed

Actually, I have two remote repositories? The first one is for example in GitHub, and the other one is my "origin"?

No, you don't have three repos (third would be your local one). Unless "working with multiple remotes", you have two repos:

  • local, on hdd, that one with .git directory
  • remote, on GitHub/etc, pointed by URL/path/etc, aliased as origin for brevity

Is "origin" only a reference or is a real repository (a mirror)?

Basically, just a reference, so you don't need to write URL all the time.

But then, there are the tracking branches.

Since tracking branches remember the last-seen state of remote repository, then .. it is also a mirror, or rather, a partial selective mirror. The tracking branch forms that mirror. A tracking branch called origin/master remembers the last-seen state of master on origin. Depending on when you did your last fetch/pull/push, it may be a state from 5 minutes ago, or 5 days ago. The actual current master on origin may look differently now because other people could have modified it. But your local tracking branch origin/master remembers some exact old state of a single branch of remote repo.

A single tracking branch remembers a last-seen state of single branch of remote repo. If remote repo has 100 branches and you just have single tracking branch for origin/master, then you certainly don't have a mirror for the whole remote repo. Just a mirror for last seen master branch from it.

Since you can work with multiple remotes (= multiple URLs = origin1, origin2, archive, houseoffice, etc), managing tracking branches for all of them would quickly turn into a mess. Therefore, trackig branches are grouped by the remote they point to. origin/master is the name of a tracking branch, but the origin/ part is an important prefix meaning that this tracking branch referring to a remote repo called by origin shorthand.

So, in fact, origin is a name, and it servers at least two purposes:

  • it's a shorthand for full URL that indicates the target remote for various operations (git fetch origin master)
  • it's a grouping name for tracking branches (origin/master, origin/develop)

and if any tracking branches do exist in your repo, they together form a partial mirror of the remote.


EDIT: well, it took me so long to write it, that you added some more questions in the meantime.

When I do a git fetch, I'm downloading latest changes from remote repository to my local repository, but not for "master", then for where?

To the tracking branches. I didn't say that explicitly assuming it's obvious, but well, since on your local side you have only one repo and all local knowledge is stored inside it, then state of your local branches, and the state of tracking branches, both are stored inside your local repo. But that's not having "two repositories". It is a single repository that knows/remembers both your branches and last-seen-state-of-some-of-remote-branches.

In this picture, there are 3 repositories (Local, Remote/Origin and Remote) Is it true?

No it's not. It's a simplification of idea of tracking branches and it tries to present them as having a separate repository called "remote/origin". It's probably because that image was complicated by adding "working directory" and "staging" things to show.. uhm.. let's say "the data flow" caused by commands.

A repo is a repo. It does not contain repos. ***) Aside from bare/non-bare repos (which you don't learn they exist until you set up a "git server"), and aside from low-level repo version/layout (that you almost never touch except for some times when a breaking change in Git happens), then there are no special types of repos.

There are only repositories, multilpe of them, sitting at different locations.

Depending on where you do sit, then one of them you can call local, and all other are remote. Under normal conditions, that's it. That's the only "type of repository".

So, you have only two repos: local and remote (unless you work with multiple remotes, where you have 1+N: local, remote1, remote2, remote3, ..).

If so, then the remote repo is just the same as your local repo.
If your local repo can have a working directory, so can the remote repo.
If your local repo can have a staging area, so can the remote repo.
If your local repo can have a tracking braches to remotes, so can the remote repo.
If your local repo can work with multiple remotes, so can the remote repo.
and so on..

Of course, when you work on your local repo, all other are remotes, and you can't easily access i.e. staging area on the remote origin, or tracking branch foobar/barbaz sitting on remote origin. But these may exist there. Maybe with some a-bit-lower-level-commands, maybe you can actually access them from your local.. maybe you can write something like git fetch origin refs/remotes/foobar/develop:develop-on-foobar-via-origin. I don't really remember.

Anyways, since remote repo is a fully-functional repo just like your local one, that's why saying that "tracking branches like origin/* form a second repository" is a nonsense. A "repository" is the whole thing together: workingdir + staging + stash + localbranches + trackingbranches + config + reflog + ..., and the "partial local image of remote repo" is NOT a repository. It's just some state that your local repo remembers, among many other states and other things it remembers.

Each part of repo it isn't 'a repository on its own', but of course many of them form or contain some more-or-less complete view of a state of your/their code, i.e:

  • working dir contains some state of your code
  • each local branch contains*) some state of your code
  • each tracking branch contains*) some state of their code
  • each reflog entry contains*) some state of some code
  • each stash entry contains*) some state of your code
  • etc

but neither of them can really be called a 'repository'.

*) actually, they don't "contain", only "remember" or "point to".. but that's another story about how git stores things

***) there are things called submodules/subrepos, it's pretty intuitive at basic layers, but let's leave that for now, seriously - configuration and automating fetches/pushes may get complicated, and if fetch/pull/push/etc is not obvious to you, then running these ops on multiple nested repos at once is out of discussing/explaining right now. Speaking of submodules, there are also things called subtrees, totally different beasts than submodules, they also allow something like "having more than one repository in a repository", but dear, these are really complex to start with. Stay away from subtrees until you feel "advanced" or "expert" in Git.

quetzalcoatl
  • 32,194
  • 8
  • 68
  • 107
  • Thanks a lot, now I know how it works, can you give me a positive vote in my question? I think my question is very useful, but someone give me a negative vote. – danilo Jan 01 '18 at 20:47
  • I updated the picture to better reflect how git works – danilo Jan 01 '18 at 20:53
  • @Danilo: Sure thing. I have no idea why someone put that -1. That's probably from some earlier versions of the question text. Don't remove the question. I'm sure that over time it will surface with positives. I think that improving the title a bit could help. I think that most of the people will take "2 types of repositories" as asking about bare/non-bare repos and the question is about something else. I can't speak for beginners though. Something like `is 'origin' a copy of GitHub repository` or `is 'remote' a mirror of external repository` would be a better title, but only time can show. – quetzalcoatl Jan 01 '18 at 21:46
1

"master" is the name of branch. "origin" is the name of remote repository.

When you do a git push, you're sending your local master(or another) branch to your remote repository(e.g. Github). When you do a git fetch, you're downloading latest changes from remote repository(default remote repo is called "origin") to your local repository, but without merging these changes to local "master" branch.