0

I have a small project that I have been working on from two different computers. I git push to github (the remote is surprisingly enough called origin) regularly, but some times I have been working for a week on one computer before I get back to the other. And when I do get back, I just want a complete update. All old branches pruned away, all new branches pulled, etc.

I could completely delete the local project, then git clone the origin repo. That feels dirty, but it's not a large project, so it takes seconds, and is done with basically two commands.

Is there a similarly quick and easy way to do this within git itself?

I saw this thread, and a few similar ones, but all the answers seem to either use a script, or work on a branch-by-branch basis, which is a bit more tedious than what I thought should be possible.

Arthur
  • 653
  • 2
  • 6
  • 21
  • "feels dirty"? Why so? It sounds very clean to me. – Romain Valeri Jan 23 '20 at 10:17
  • @RomainValeri Well, if the project were bigger, or had large resources like models and textures that I didn't what to delete just to redownload, I might want to avoid that option. So if `git` has a built-in alternative, that would probably be more scaleable. And while I might not personally need scalability right now, I'd say there ought to be such an option. – Arthur Jan 23 '20 at 10:19

3 Answers3

1

The thing to remember here is that "branches"—more specifically, branch names—don't really mean anything to Git. They mean things to you, but as far as Git is concerned, each name is just a method of finding one specific commit. What Git really cares about are the commits in the repository.

Each commit has its own unique big ugly hash ID. You see these IDs in git log output for instance. The true magic of Git is that every Git repository everywhere will agree that, if one specific commit in your repository—or one in theirs—has some hash ID H, no other commit anywhere can ever have that hash ID.1 So when two Git repositories "meet up", by git fetch or git push, they need only compare hash IDs. Your GitHub repository, over on GitHub, has some commit with some hash ID H listed under their name master. Your Git calls up their Git. Their Git says: My name master corresponds to hash ID H. Your Git checks: Do I have H? If your Git has H already, it's done. If not, your Git asks their Git to send commit H.

Your own Git has your own names. One of them might be master. It has some hash ID. It doesn't matter what hash ID your master has here, the only thing that matters to your Git for the fetch operation is: Do I have commit H at all? Your Git can always look up all of its internal Git objects directly by raw hash ID. You either do have H already, or you don't. Since no Git anywhere can ever use hash ID H for anything but this commit, all your Git has to do is check that one thing.

If you don't have H after all, your Git will have their Git send commit H. Now, another thing about every commit is: each commit records some set of parent commit hash IDs. The parent or parents of a commit is the commit (or for a merge, two or more commits) that comes "just before" this commit. That is, given a long string of commits, made one at a time, each one stores the previous comit's hash ID, in a backwards-pointing chain:

... <-F <-G <-H

So if you're going to get H, their Git will now offer you G. Your Git asks itself: Do I have G already? If not, your Git says OK, give me that too. If you do have G, your Git says No thanks, I have that one already. This repeats for every commit they have that you don't. Eventually, git fetch has a list of all the commits it must send, and all the commits that you have that this list of commits will extend.

At this point, their Git packages up the desired subset of their commits—and associated snapshots and so on—knowing which commits you have because your Git told them I already have that one. Their Git can compress all the files and such, everything that's inside the commits, using this information. So you end up getting a much smaller data-set over the network than if they just sent you everything.


1Technically, two different commits can have the same hash ID, but only if they never "meet". That is, if you connect your Git to some other repository, and it has some commit with hash ID H and your Git has a different H, both Gits will believe this is the same commit and neither will send it to the other. As long as your Git and their Git never meet and try to exchange commits, this causes no problems. In practice, this kind of hash collision doesn't become even remotely likely until you have more than about 1017 objects. At that point, it gets to a similar chance as having your computer's storage system just randomly fail on you, which is similarly disastrous. It can be a problem if someone carefully engineers a hash collision. For details, see How does the newly found SHA-1 collision affect Git?


Git uses names to find last commits

We drew a simple chain of commits above, ending at commit H:

...--F--G--H

(where the letters stand in for the real hash IDs, which look completely random, but are actually completely deterministic). Given the ID of the last commit H, we just have Git look up H. Inside H, Git finds hash ID G, which lets it look up G. Inside G, Git finds hash ID F, which lets it look up F, and so on. This lets Git go from the last commit all the way backwards to the very first commit ever.

This even works in the presence of branching commit structures:

          I--J
         /
...--G--H
         \
          K--L

There are now two last commits. J is the last commit in one structure, and L is the last commit in the other. The two structures—should we call them branches?—meet as they travel back to H, and then they share commits all the way back to the beginning of time (presumably commit A).

In a real repository we might have thousands, or millions, of commits. There's a big old jumble of hash IDs. How will you, or Git, quickly find the last commit? You can—and in maintenance commands, Git does—list out every commit, and figure out which ones are "last". This takes a while: multiple seconds, or in really big repositories, sometimes minutes. It's clearly way too slow. Plus, who wants to work with hash IDs? Humans certainly don't.

So, Git offers us the ability to use a name to remember one (1) hash ID. We can pick the name branch1 to remember hash ID J, and the name branch2 to remember hash ID L:

          I--J   <-- branch1
         /
...--G--H
         \
          K--L   <-- branch2

We can, if we want, use the name master to remember hash ID H:

          I--J   <-- branch1
         /
...--G--H   <-- master
         \
          K--L   <-- branch2

It doesn't matter that there are commits after H. H is the last commit in master. That's the entire definition of a branch. That's it: a branch name is just a pointer, in Git; it's just a way to hold one hash ID, and by definition, whatever hash ID the name holds, that's the last commit in that branch.2

So branch branch1 ends at commit J, and automatically includes every commit that you can get to by starting at J and working backwards. Branch branch2 ends at commit L and includes all commits before L, with Git again working backwards. Git always works backwards. If it needs to work forwards for some reason, it does that by first working backwards and remembering hash IDs as it goes, and then going forwards through the remembered list. And, commits can be, and very often are, on more than one branch.

When your Git gets new-to-you commits from their Git, your Git needs to set up some name(s) to remember the hash IDs that their Git had in their branch names. But their Git tells you this stuff right at the start of the git fetch. You run git fetch origin, and the Git over at origin—at the URL listed under the name origin—says: My master holds H, my develop holds L, .... Your Git just got this whole list.

Then, as the fetch runs, your Git selects any commits they have that you don't, and has their Git send them over. This adds new commits to your repository, without removing any commits—it literally, physically can't remove any commits as they only send you new (new-to-you) stuff. When all of this is done, you definitely have those commits.

So now your Git takes all of their branch names, and renames those names. Your Git turns their master into your origin/master. Your Git turns their develop into your origin/develop. This goes on for all of their branch names. These are Git's remote-tracking names because they remember ("track") the branch names and hash IDs that your Git saw in their Git, under your remote name origin.

So, let's say you have this before you run a new git fetch:

...--G--H   <-- master, origin/master

You and they both have only one branch, named master, and both of these names identify commit H. Then you run git fetch. Their master now points to new commit J, and they have a branch name develop that points to commit L:

          I--J   <-- origin/master
         /
...--G--H   <-- master
         \
          K--L   <-- origin/develop

You do not have to do anything, but if you want, you can have your Git move your name master to point to commit J. There are many ways to do that, but often, the easiest is to git checkout master first if necessary.3 This attaches the special name HEAD to the name master, so that Git knows which branch name to use for operations that write new hash IDs into the current branch:

          I--J   <-- origin/master
         /
...--G--H   <-- master (HEAD)
         \
          K--L   <-- origin/develop

The git checkout operation also arranges your index (aka staging area) and work-tree (aka working tree) to let you view and/or work with the commit identified by the branch name. That is, H is now your current commit, and master is your current branch. We won't go into detail on the index and work-tree here, but they're pretty important: they are where you build up your next commit, and how you work on files, which Git stores inside commits in a special, read-only, Git-only format.

Anyway, now that you're in this particular situation, you can tell Git to do a fast-forward not-really-a-merge "merge" operation to have your master catch up with their origin/master:

git merge --ff-only origin/master

This takes your current branch name—master, from the checkout we just did if necessary—and does the fast-forward operation if it can. If it can't, it doesn't do a merge, it just says that it can't fast-forward, and quits. Since here, it can fast-forward instead of merging, it does that:

          I--J   <-- master (HEAD), origin/master
         /
...--G--H
         \
          K--L   <-- origin/develop

You now have commit J out, and can see (and work with) its files in your work-tree. Your name master now identifies the same commit as their name origin/master and you still have all the commits that they have, and your Git still has, for its remote-tracking names, the names of their branches.


2To add a commit to a branch, you do this with your Git:

  1. Select that branch name and its last commit, e.g., git checkout branch1.
  2. Do the stuff needed to tell Git to make a new commit.
  3. Your Git writes out a new commit, which gets a new unique hash ID. The parent of this new commit is the commit you selected in step 1. Then your Git just writes the hash ID created here, in step 3, into the name you selected in step 1.

Now the branch name identifies the last commit in the branch, as it did before you made the new commit. The new commit is the last commit in the branch!

Pictorially:

...--G--H   <-- branch (HEAD)

becomes:

...--G--H   <-- branch (HEAD)
         \
          I

for a moment, but then Git immediately writes I's hash ID into the name branch. Git knows that branch is the right name, because the special name HEAD is attached to the name branch. So now we have:

...--G--H
         \
          I   <-- branch (HEAD)

and there's no reason not to just draw them all in a straight line again.

3In Git 2.23 and later, you can use git switch instead of git checkout. The reason to do this is that git checkout, as a command, has too many different jobs it can do. So in Git 2.23, it's been split into two separate commands: git switch does half of its jobs, and git restore does the other half. If you have an older Git, or are used to the old way of doing things, the old git checkout command still works the same as always.


Pruning

Note that if they delete a branch name, your Git still retains your memory of their name. That is, suppose they decide commits K-L are worthless and just trash their develop name entirely. You have this in your repository:

...--G--H--I--J   <-- master (HEAD), origin/master
         \
          K--L   <-- origin/develop

and you run git fetch and have your Git call up the Git at origin. They list the fact that their master identifies commit J, and that's it for their branches. Your Git says ah, I have commit J already and they send you no commits and the two Gits disconnect. Your Git would update your origin/master, changing it from pointing to J to point to J, which doesn't change it, so nothing really happens here. And then your Git is done, and your origin/develop still remembers commit L, even though they don't have a develop any more.

If you don't want this—if you want to get rid of your origin/develop—you simply tell your Git to prune when it is fetching. Since your Git gets the full list of all of their branches, your Git can see that they don't have a develop any more. So your Git will now delete your origin/develop, leaving you with:

...--G--H--I--J   <-- master (HEAD), origin/master
         \
          K--L   [abandoned]

To do this pruning, run git fetch --prune. To make all git fetch operations prune automatically when they can, configure fetch.prune to true:

git config --global fetch.prune true

for instance.

Note that the commits are still there, at least for a little while. With no name to find them, your Git will eventually remove them.4 The removal process for abandoned commits is actually done by a maintenance command, git gc, that you can run, but it takes a long time: multiple seconds, or even minutes. Git runs it for you automatically, in the background, when that seems to Git to be a likely-profitable venture, so there is hardly ever any reason to run it yourself.


4When you abandon your own commits, your Git tends to remember their hash IDs for at least another 30 days in one or more reflog entries. That keeps the abandoned commits alive for at least 30 days, in case you want them back. In this case, though, there are no reflog entries any more, so this "eventual" is as soon as the next git gc actually runs. It's hard to predict when that will be, though.


This all leads to the last rule: don't bother with branch names until you want one

Look back at our diagram where they had two branch names, and we had one:

...--G--H--I--J   <-- master (HEAD), origin/master
         \
          K--L   <-- origin/develop

We don't need our own branch name develop here. We only need one if we want to add commits to the end. We can make one:

...--G--H--I--J   <-- master, origin/master
         \
          K--L   <-- develop (HEAD), origin/develop

and then make new commits:

...--G--H--I--J   <-- master, origin/master
         \
          K--L   <-- origin/develop
              \
               M--N   <-- develop (HEAD)

Now we need to send our new commits to them, for which we use git push. This works a lot like git fetch: we offer them commits we have that they don't, by hash ID. But it ends differently. Having sent them our commits M-N, we then ask them to set their branch name develop to point to commit N. If they accept, we update our own origin/develop:

...--G--H--I--J   <-- master, origin/master
         \
          K--L
              \
               M--N   <-- develop (HEAD), origin/develop

Commit L no longer has any name pointing to it, so we can straighten out the kink in the drawing. But we can also switch back to our name master and delete our develop:

...--G--H--I--J   <-- master (HEAD), origin/master
         \
          K--L--M--N   <-- origin/develop

The commits are all still there. We find them using the name origin/develop. There's no reason to find them using the name develop any more. So as soon as we're done with it, we just stop using it and delete it. Then if they add more commits and we git fetch, the only name we have updates automatically:

...--G--H--I--J   <-- master (HEAD), origin/master
         \
          K--L--M--N--O   <-- origin/develop

If we find we need to add on more commits, we git checkout develop again to create our name develop from our origin/develop:

...--G--H--I--J   <-- master, origin/master
         \
          K--L--M--N--O   <-- develop (HEAD), origin/develop

and we're ready to add new commits, and then git push, as usual.

We only need our own name if we're going to add new commits. Otherwise, their names—our remote-tracking names—suffice. We just use those and we're done.

We can even look at their commits using Git's detached HEAD mode. Suppose we've pushed O and deleted our develop so that we have:

...--G--H--I--J   <-- master (HEAD), origin/master
         \
          K--L--M--N--O   <-- origin/develop

Now they add a new commit P. We use git fetch to get it:

...--G--H--I--J   <-- master (HEAD), origin/master
         \
          K--L--M--N--O--P   <-- origin/develop

We can git checkout origin/develop now. Since origin/develop is not a branch name—it's a remote-tracking name—our Git will use its detached HEAD mode. In this mode, the special name HEAD just holds the raw hash ID of the commit we're browsing:

...--G--H--I--J   <-- master, origin/master
         \
          K--L--M--N--O--P   <-- HEAD, origin/develop

If we make a new commit Q here, the name HEAD advances to point to our new commit:

...--G--H--I--J   <-- master, origin/master
         \
          K--L--M--N--O--P   <-- origin/develop
                          \
                           Q   <-- HEAD

and now we really should make a branch name to remember the hash ID of Q, because if we switch away from this commit (back to P or J, say), we'll forget the hash ID. Who can remember those things? Well, Git can remember them. We just need to create a name. That's what branch names are for: to remember the last commit. If Q is to be the last commit, we make a new name for it. We can call it whatever we want:

git checkout -b feature

and now we have:

...--G--H--I--J   <-- master, origin/master
         \
          K--L--M--N--O--P   <-- origin/develop
                          \
                           Q   <-- feature (HEAD)

The git checkout -b operation creates the name we choose, and attaches HEAD to the name. The commit we chose was the commit we were using: the one that HEAD used to point to directly. Now HEAD is attached to the new name, feature, and the name—the branch name—points to the commit.

Usually, you create the name pointing to P first, then commit to make Q. But if you forget, this is how you recover: git status says detached HEAD and you say to yourself, oops, I should create a branch name now. You run checkout -b, or in Git 2.23 and later, git switch -c, to do that.

Conclusion

Your branch names are there to remember last-commit hash IDs. Create them when you want that. Otherwise, don't bother with names. Use the prune option to snip away dead origin/* names.

Your Git kind of wants to use at least one name, so you can let it do that: let it use master, for instance. Then do a fast-forward after git fetch. If you never actually do your own work in a repository, you'll just stick with master and let git merge --ff-only origin/master take you to their update.

Or, you can even use detached-HEAD mode: git checkout origin/master, then delete the name master. You don't actually need it. The detached HEAD name, plus the remote-tracking name, will serve. After git fetch updates your origin/master, you can just git checkout origin/master again, to move the detached HEAD. This might surprise some Git users, so if you do use this approach, and someone else wants to take over this Git repository, you can warn them—but your Git repository is for you, not for them.

torek
  • 448,244
  • 59
  • 642
  • 775
  • This was really thorough and educational read. And if I understand you correctly, you're saying that what I am asking for is not something one would really want to do in a conventional git workflow. Which I guess is fair enough. I'm still getting used to working with git, and I don't know yet what all the good habits are. – Arthur Jan 23 '20 at 23:15
  • Yes. The TL;DR of this is "don't create branch names needlessly" but there's a *lot* hidden behind that adverb, "needlessly". – torek Jan 23 '20 at 23:47
0

So, if I understand your question correctly you can use:

git fetch [remote]

to fetch changes from the remote but not update tracking branches; or

git fetch --prune [remote]

to remove refs, that were removed from the remote repository.

Also look into:

git pull [remote]

to fetch changes from the remote and merge current branch with its upstream.

YamiOmar88
  • 1,336
  • 1
  • 8
  • 20
  • This is the branch-by-branch approach I mentioned, isn't it? I would have to manually checkout every single branch, then pull that, then checkout the next, then pull that, and so on, right? Isn't there a way to do them all in one fell swoop? _That_ is what I'm asking about. I guess `fetch` with and without `--prune` can be run separately without too much fuss. But I am looking for an approach with a constant (and low) number of commands needed to run, not something that is O(number of branches). Like "Deleting the whole project, and cloning" but within git itself so I don't need to redownload. – Arthur Jan 23 '20 at 10:25
0

What you are describing is the way I work on all my projects. [Okay, that's not entirely true, so read on.]

In this scenario, I am not really "collaborating" with myself. Only one computer is in charge at any given time, often (as you say) for days at a time; then I switch back to the other computer. The underlying reality is usually that I'm traveling. Before I travel, I switch "mastery" to the laptop; when I get home, I switch "mastery" back to the desktop computer.

In this arrangement, I use github solely as an intermediary; the repo there is private (before github permitted free private repos, I used bitbucket for this purpose). Well, not solely; it is also very nice to have an offsite remote in case I or my computer is "hit by a bus".

So I would say: yes, do what you're describing.

Now, as for the implied question in

I just want a complete update. All old branches pruned away, all new branches pulled, etc.

...The way to push all branches to the remote in one line is simply git push --all, but as for pulling, no, there isn't exactly a one-line version of that — at least, not for what I suspect you mean by that. Even making a new clone is not a one-line version of that. When you do a clone or a pull all, you do get the entire repository, including all the remote branches; but local branches corresponding to remote branches are not automatically created. That is why there are Stack Overflow questions and answers like this one:

How to fetch all Git branches

...and this one:

Can "git pull --all" update all my local branches?

So if you're happy doing the kind of thing recommended in the answers to those questions, there's your O(1) update.


Footnote: Recall that I said "that's not really true". I have another way of working. In this other way, I sync the work tree folder between the two computers, using the Finder or rsync (I use Macs) as an intermediary. I still use GitHub as an offsite backup, but I transfer mastery from one computer to the other just by doing the sync. I could in fact use a Finder-copy at transfer time, but I most of the time I use sync software instead. There is no difficulty about this because the git repo is just a bunch of folders and files like any other: it syncs/copies from one computer to another just fine. And this way, you do get all the local branches, because the whole local git is just copied from one computer to another.

matt
  • 515,959
  • 87
  • 875
  • 1,141