0

Hi I'm pretty new with git so please bear with me. Say, I'm currently on a branch:

remotes/origin/dev-1.1

And my colleague created some new files and updated some old ones on a different branch:

remotes/origin/dev-1.2

I want to update my local files so that I can work with the new files and updated ones. Should I be using git checkout remotes/origin/dev-1.2 to update my local directory? Or are there additional steps I have to take to download the new and updated files?

code_learner93
  • 571
  • 5
  • 12

1 Answers1

1

Say I'm currently on ... remotes/origin/dev-1.1

You aren't. This part of Git is pretty confusing, but it's also important.

... And my colleague created some new files and updated some old ones on ... remotes/origin/dev-1.2

Technically, your colleague simply created some new commits, which you're finding through the name remotes/origin/dev-1.2.

The tricky part here is that it's wrong to say that some particular files are "on" any particular branch. Where files are, in Git, is inside commits. The commits themselves are on branches, but—this is why this is confusing—each commit can be on any number of branches, anywhere from zero branches, to every branch. Moreover, the very word branch is ambiguous. (See also What exactly do we mean by "branch"?)

To help make this more concrete, say we have some commit whose ID is a123456.1 Commit a123456 holds some set of files. The particular versions of those files are frozen into a123456 for all time. As long as you have that commit in your repository,2 you can have Git reach into it and pull out either the entire set of those files, or any one file from that commit, as it's stored in that commit.

But: commit a123456 might be on feature right now, and then later, it might be on both feature and dev, and in the future, it might be on both dev and master / main. At this point you might have deleted the name feature, now that the feature is merged. Commit a123456 is still there but with the name feature gone, it's not on feature, which doesn't even exist. It's now only on dev and main, or whatever. Create a new branch and now it's probably on all three branches.

If you rename a branch, that branch still contains all the commits it contained before, but it has a different name now. Instead of feature, maybe it's now feat-123 so that you can work on feat-124 too. The key to all of this is to think of the branch names as ways to find commits. It's the commit hash IDs—the a123456 style random-looking numbers—that are the "true names" of each commit.


1Actual Git commit hash IDs are much longer and are random-looking. They're not at all random, really, but they look random. Each commit has a unique number—expressed in hexadecimal like this—that we can shorten by referring to it by some unique prefix of its full ID. As a repository fills up with more and more stuff in the database, the length of the prefix needed to get something unique gets longer, to the point where clones of the Linux kernel need 10 to 12 characters from the front of some hash ID, sometimes.

2Git is also built to only ever add commits to the repository. There are some ways to get rid of bad commits, should that be necessary, but they're tricky. Commits are almost viral in the way they spread from Git repository to Git repository, and if you've let your Git have Git-sex with another Git repository and given it commits, those commits may come back to you later, "re-infecting" your repository, even if you've gotten rid of them on purpose. You don't normally need to worry about this: if you have a hash ID, you probably got it from your Git, and it's the hash ID of something in your Git, and you probably want to keep it.


Branch names vs remote-tracking names

In Git, a branch name is a way to find commits. But branch names are not the only way. You can of course use the raw hash ID. To do that, most people will use cut-and-paste with their mouse or whatever: they're too hard to type in correctly. But that, too, is not the only way. Git has lots of different kinds of names. Each kind of name has a different purpose. Your branch names are for you to use, to find commits that you, for whatever reason, think are significant.

The name remotes/origin/dev-1.1 is what I call a remote-tracking name.3 A remote-tracking name is a name in your own Git repository that corresponds to some branch name in some other Git repository. If branch names find commits—and they do, in a way we'll see below—then, when you get new commits from some other Git repository, your own Git repository is going to need a way to find these new commits. Your Git could write these names into your branch names, but this would create a problem:

  • Suppose you made a new commit that Bob (or whoever) doesn't have. You find this commit with your branch name feature.

  • Suppose Bob made a new commit that you don't have. Bob finds this commit with his branch name feature.

  • You connect your Git to Bob's Git, and get his commit. If your Git sticks his commit number into your name feature, your Git loses your new commit. (It's still in there—at least for a while—but it's become hard to find.)

So your Git doesn't do that. Instead, you tell your Git about Bob's Git repository: you say to your Git to get commits from Bob's Git, reach out to URL _____ (fill in the blank here with the right URL). When your Git does this and gets his new commit from his branch feature, your Git creates or updates your remote-tracking name bob/feature.

This means that your remote-tracking names are not branch names. They're just as good as branch names for some purposes, and Git documentation uses the phrase remote-tracking branch name, and many humans refer to these as branches. But they're not branch names. We'll see in a moment why this distinction really matters.


3Git calls this a remote-tracking branch name. The word branch here, in this four-word phrase, serves mainly to clutter up the phrase and mislead people into shortening the phrase to remote branch, which is way too easy to mix up with the notion of a branch name on some remote repository. So I've taken to calling it a remote-tracking name, which isn't great, but at least doesn't add yet another form of "branch".


Commits form chains

What's missing from the above picture, and is absolutely key to making sense of Git, is the notion of chaining commits together. Git strings commits together, one after another—or more precisely, one before another. But it doesn't do this by names: it does this using the commits themselves.

Each commit holds a snapshot of every file—that's its main data, more or less—but each commit also holds information about the commit itself: who made it, when, and why, for instance, with the why part being the log message.4 In this same metadata, Git stores something just for Git's use: each commit stores the raw hash ID of some earlier commit, or commits. This makes commits link, in a backwards-looking fashion, to what Git calls their parents.

When we have a simple series of commits all in a row, we can draw this. Let's let a single uppercase letter stand in for each real hash ID, and make a small picture like this:

... <-F <-G <-H

Here, H stands in for the hash ID of the last commit in this series. It contains a snapshot of files, but it also has metadata: who made commit H, when, and so on. In that metadata, commit H holds the hash ID of earlier commit G.

Commit G, of course, has a snapshot and metadata. The metadata for commit G holds the hash ID of earlier commit F.

Git can take any two snapshots, such as G and H, and compare them. When files are identical in both commits, Git will say nothing about those files. When two files are different in the two commits, Git can produce a recipe by which the earlier commit's copy of that file can be changed to match the later commit's copy. This is how Git can show us commits—which are snapshots—as changes, even though commits don't hold changes. We just take a pair of adjacent commits and ask Git to compare (diff) them.


4This is why it's a good idea to write a good log message: anyone can come along later and see what you did, but they won't necessarily know why you did it. Describing a bug, including how to reproduce the bug, is quite valuable when examining code that's supposed to fix that bug. Describing a new feature is valuable when examining code that's supposed to add that feature. The code expresses how but not necessarily why.


Names find the last commit in a chain

Now we can see how branch and other names work. To work backwards through the chain of commits that ends at commit H, Git will need the actual hash ID of commit H. But hash IDs are big and ugly and not good for humans. We like names, that mean something to us, like main and dev and feature. We could create a file (outside the repository perhaps) and put our hash IDs in there, but it makes more sense to have Git do it for us:

...--F--G--H   <-- main

That's a branch name. Once we create a second branch name, also pointing to commit H, we'll need a way to remember which name we're using:

...--G--H   <-- dev, main (HEAD)

If we run git checkout dev or git switch dev, we're telling Git that we want the files from commit H out—they're already out, because they were out for main—but we want to be "on" branch dev, like this:

...--G--H   <-- dev (HEAD), main

Here's why this matters: When we make a new commit, Git will:

  • gather up source files for a snapshot;
  • gather up metadata (name, email, date-and-time, etc);
  • add the current commit's hash ID as the parent of the new commit;
  • write all this out to make the new commit, getting a new unique hash ID that we'll call I here; and
  • last and perhaps sneakiest, *write the new commit's hash ID into the current branch name.

The effect of that is that we now have:

...--G--H   <-- main
         \
          I   <-- dev (HEAD)

That is, the name dev now selects commit I, not commit H. Commit H is the parent of commit I, so that I points back to H. Commits up through H are still on both branches, but now commit I is only on branch dev.

(Later, we can have Git move the name main in such a way as to have commit I be on main, if it turns out that this is a good idea. If not, we can completely drop the name dev and with it, the way to find commit I. If we haven't sent commit I anywhere else, no other Git has commit I and it won't come back. So this is how and why commits aren't necessarily permanent. But as long as commit I exists, it will still have those files in it, along with that metadata, because no part of any commit can ever be changed—not even by Git itself.)

The special feature of HEAD only works with branch names

We finally get to the reason for all of the words above. When you run:

git checkout dev

or:

git switch dev

Git will find or create a branch name (or else fail; the method by which it creates new branch names is somewhat fancy and we haven't described this yet). When you use git branch or git checkout -b or git switch -c to create a new branch name, you must pick some existing commit, and Git will create the new branch name pointing to that commit. In all of these cases, what you get really is a branch name, and git checkout and git switch can attach the special name HEAD to that branch name, as shown above.

With other forms of name, though, including remote-tracking names like origin/dev-1.1, Git will refuse to attach the special name HEAD to that other name. What this means is that you won't be "on" any branch at all. The git checkout command will put you into detached HEAD mode. It will do this without asking whether you understand all the consequences of this mode. The new (since Git 2.23) git switch command is better about this: it requires that you use the --detach flag to indicate that you understand that you're going into detached HEAD mode.

In detached HEAD mode, you still have a commit checked out. It's just that you're not on any branch. That's all there is to it, really, but that has a big consequence. Let's take our example above:

...--G--H   <-- main
         \
          I   <-- dev (HEAD)

and now go into detached HEAD mode by using:

git checkout --detach dev

(which is an explicit way to say to git checkout that you want the same commit, but want to enter detached HEAD mode). The result looks like this:

...--G--H   <-- main
         \
          I   <-- HEAD, dev

That is, the name HEAD finds commit I directly, without going through the name dev. If we now make a new commit J, we get:

...--G--H   <-- main
         \
          I   <-- dev
           \
            J   <-- HEAD

The new commit is made, but HEAD is the only way to find it.

Git really needs the hash ID. Whenever you use a name like dev or HEAD, Git has to translate that name to a hash ID. Use git rev-parse to see how this works, e.g., run:

git rev-parse HEAD

for instance.

Well, now suppose we've made commit J, then we decide we want to go look at main for a moment. We run git checkout main to do that, and get this:

...--G--H   <-- main (HEAD)
         \
          I   <-- dev
           \
            J   ???

The git checkout command has gone and re-attached our HEAD, to the name main this time, to get Git to extract commit H for us to look at and maybe work on. Now we can create and switch to a new branch name feature:

git checkout -b feature

and make another new commit K:

          K   <-- feature (HEAD)
         /
...--G--H   <-- main
         \
          I   <-- dev
           \
            J   ???

That's great, but now we want to go get something from commit J. But ... how are we going to find it? If we wrote down the hash ID, or it's still in scrollback on the screen, that could rescue us. But mostly, this is just a bad idea. The only time to work in a detached HEAD like this is:

  • when you're looking at a historical commit, for whatever reason (including git checkout of a tag, which does that); or
  • when you're in the middle of a git rebase, because rebase uses detached HEAD mode internally. Once you finish up the rebase, Git will re-attach HEAD—or if you decide the rebase is a bad idea, you can use git rebase --abort and Git will put everything back and re-attach HEAD, but either way, you're back into "attached HEAD" mode (Git doesn't call it that: it just says "on a branch", but that's the obvious opposite of detached HEAD mode).

Remote-tracking names and creating your own branches

It's time now to note how the remote-tracking names work. Suppose you've cloned some repository, using git clone. That other repository has a short name in your own Git repository. The standard short name for this is origin (though you can pick some other one if you like).

That other repository has branches. It may have only a main or master, but chances are it has more branches than that. Your Git has copied their Git repository's commits. These commits are literally exactly the same in both repositories: they have the same "real name" hash ID, and each one points backwards to the same previous hash ID and hence forms the same chains. But their Git's branch names have become your Git's remote-tracking names. So instead of:

...--G--H--I   <-- main
         \
          J--K   <-- dev

what you have, initially, in your own repository is:

...--G--H--I   <-- origin/main
         \
          J--K   <-- origin/dev

(Aside: the full name of a branch name like main is refs/heads/main. The full name of origin/main is refs/remotes/origin/main. Git lets you take refs/ off, or even refs/heads/ and refs/remotes/. So that's why we get by with simple branch names and simple origin/* remote-tracking names. If you run git branch -r, Git will list your remote-tracking names as origin/whatever, but if you use git branch -a, it lists them as remotes/origin/whatever. It's not consistent, but both work, so it's not that big a deal.)

As soon as git clone has finished cloning their Git repository, however, your own Git goes and creates some branch name.5 The name your git clone creates is the one you tell it to, with your -b option. If you don't give your git clone a -b option, your Git asks their Git what name they recommend, and creates that one. So one way or another, you get an initial branch name. Your Git then checks out this branch name:

...--G--H--I   <-- main (HEAD), origin/main
         \
          J--K   <-- origin/dev

(assuming you used -b main, or they recommended main). You now have commit I out and are on your own branch named main.

If you now run git checkout dev or git switch dev, your Git looks at your branch names for one spelled dev (more precisely, for refs/heads/dev). This does not exist so the obvious thing to do would be to produce an error message and quit. But instead, your Git will now root through your remote-tracking names. If one of them is close enough to dev—if it came from someone else's dev, in other words—your Git will create your dev, using that remote-tracking name to pick the commit:

...--G--H--I   <-- main (HEAD), origin/main
         \
          J--K   <-- dev, origin/dev

Having created the branch name, your Git can now switch to that commit—in this case, commit K—and attach your HEAD:

...--G--H--I   <-- main, origin/main
         \
          J--K   <-- dev (HEAD), origin/dev

This is very similar to git checkout origin/dev except that now you have a new branch name and an attached HEAD. If you want to do any actual work, this is generally the way to go.


5There is an exception when you tell your Git to clone by tag name. In this case, your Git does a detached-HEAD checkout of the tag, and creates no branch name at all.


Finally, we get back to your question

I want to update my local files so that I can work with the new files and updated ones. Should I be using git checkout remotes/origin/dev-1.2 to update my local directory?

I don't know what work, if any, you might have done so far in detached HEAD mode. If you have made some new commits in this mode, you should start by creating a branch name to make your Git remember the hash ID of the last commit in your new chain. For instance, let's say you started with:

          I   <-- origin/dev-1.1
         /
...--G--H   <-- main (HEAD), origin/main
         \
          J--K   <-- origin/dev-1.2

You then ran git checkout origin/dev-1.1, which got you into detached-HEAD mode:

          I   <--  HEAD, origin/dev-1.1
         /
...--G--H   <-- main, origin/main
         \
          J--K   <-- origin/dev-1.2

Then you made a new commit or two:

            L   <-- HEAD
           /
          I   <--  origin/dev-1.1
         /
...--G--H   <-- main, origin/main
         \
          J--K   <-- origin/dev-1.2

Here, L stands in for some big ugly hash ID. To avoid having to save this somewhere, we'd now like to have Git save it. Pick a name—any name you like (within limits), but something that will mean something to you, such as experiment-number-one or whatever, and run git checkout -b name with that name. Let's say you call it exp-1 (for experiment number one). Your git checkout -b exp-1 gives you this as a drawing:

            L   <-- exp-1 (HEAD)
           /
          I   <--  origin/dev-1.1
         /
...--G--H   <-- main, origin/main
         \
          J--K   <-- origin/dev-1.2

which means you now have your own branch name by which you can easily find commit L.

Now, to work with/on the files that your colleague created, you can run:

git checkout dev-1.2

Since there isn't a dev-1.2, your own Git will look through your remote-tracking names and spot origin/dev-1.2. This is the only match for dev-1.2, so your Git will create your own dev-1.2 right now, and then check it out:

            L   <-- exp-1
           /
          I   <--  origin/dev-1.1
         /
...--G--H   <-- main, origin/main
         \
          J--K   <-- dev-1.2 (HEAD), origin/dev-1.2

Note how commit L remains easy to find via your branch name.

If you didn't create any commit L, you don't need to create a new name: the name origin/dev-1.1 still suffices to find commit I. But if that commit is important to you, you might still want to put a branch name on it, to make it quick to find. Note that each time you have your Git connect to the Git over at origin, your Git can collect new commits from them, and will then update your own origin/* names to find the new commits you got from them.6

Once you have all this set up, ekkom's answer about using git cherry-pick is a perfectly reasonable way to copy your hypothetical commit L to a new, slightly different commit—let's call it L' to remark on how it's similar to L—that would look like this:

            L   <-- exp-1
           /
          I   <--  origin/dev-1.1
         /
...--G--H   <-- main, origin/main
         \
          J--K   <-- origin/dev-1.2
              \
               L'  <-- dev-1.2 (HEAD)

Note how your dev-1.2 will now be one commit "ahead of" their dev-1.2 aka your origin/dev-1.2.

Experiment with git log --all --graph --decorate --oneline. This has Git draw its version of the graphs I have been drawing here. Git draws the commits vertically, with the newest ones at the top, instead of horizontally with the newest ones at the right. Getting really nice drawings of the commits is sometimes vital, and git log --graph does not always cut it here. For much more about this topic, see Pretty git branch graphs.


6If they've explicitly removed a commit from one of their branches, your Git will do a "forced update" of the corresponding remote-tracking name, backing it off that commit too. If they completely deleted a branch name, your Git won't delete your own remote-tracking name unless you set fetch.prune to true or use git fetch --prune or git remote update --prune. These are all decisions you must eventually make: I like to have fetch.prune set to true in my global Git config, so that my Git deletes remote-tracking names when they delete their branch names.

All of this is just a long-winded (though detailed with reasons why) way of saying that you should remember that your Git will take their name updates here, so if you value some particular commit for some particular reason, consider making up your own name to remember it, rather than relying on theirs. But if you haven't looked at their commits, how much value do you think they have? If you use git checkout's name-creating trick to create your own local branch names, your Git automatically creates your own name. Your branch names are yours, to deal with as you see fit. This is mostly a non-problem.

torek
  • 448,244
  • 59
  • 642
  • 775