0

I am working with two repos: originalRepo and copyCatRepo.

originalRepo is a repo managed by a third party which has a branch that I'd like to 'copy'. I want my copyCatRepo to have a master branch which tracks all changes of originalRepo/branch_eight so that if originalRepo/branch_eight receives any updates while I am working in my copyCatRepo/master, I can update my copycatRepo/master to have those new changes.

How do I set up a branch in my copyCatRepo to track all changes in a different repo's branch? All I am trying to accomplish is add a patch to an external repo but I need to be aware of changes.

Isaac Perez
  • 570
  • 2
  • 15
  • 31

2 Answers2

0

git config pull.rebase true and set a remote to pull from the origin and push to your github.

jthill
  • 55,082
  • 5
  • 77
  • 137
0

... I want my copyCatRepo to have a master branch which tracks all changes of originalRepo/branch_eight ...

You'll probably want to create multiple remotes. A remote is primarily a name by which Git remembers the URL of some other Git repository. The first standard name for a remote is origin, and that's a good one. The second standard name is upstream, and that one is confusing, because a branch also has an upstream, and these are different things.

You'll want to set the upstream of your master branch to something unusual, and otherwise use git fetch, git merge, and/or git rebase. To run fetch followed immediately by either merge or rebase, you can use git pull as in jthill's answer.

I'll go into more detail here after the long part below.

Long

This answer is pretty simple, but does not really make any sense until you realize that Git isn't about branches. Git is, instead, all about commits. Branch names like branch_eight and master are useful, but their use is that they find commits. It's the commits themselves that actually matter.

With that in mind, let's take a look at what a repository really is: it's a collection of commits (and other internal Git objects that commits need), plus a collection of names—branch names, tag names, and other such names. We'll concentrate on just two things about a commit here, as we look at this.

Each commit has a unique number. This number is expressed as a big ugly string of letters and digits, such as 3cf59784d42c4152a0b3de7bb7a75d0071e5f878. These things look random (but aren't), and are unpredictable. Each new commit has to get a new, totally-unique hash ID, which is why they have to be so big and ugly.

Most of the bulk of most Git repositories consists of a big database: a simple key-value store with hash IDs as keys, and commits and other internal Git objects as the values. Git needs to know a commit's key—the hash ID—to find a commit in the database. So commits have these unique hash IDs. But, even though each one of these is the "true name" of each commit, they're completely unsuited for use by mere humans. So each Git repository contains a secondary database, which is another key-value store where the keys are names like branch names, and the value stored under any given key is a single commit hash ID.

This one hash ID is all we need, because every commit stores, inside itself, the hash ID(s) of some previous commit(s). Most commits store exactly one previous-commit hash ID, which Git calls the parent of a commit.

Whenever you make a new commit, you start with some existing commit. (Obviously there's a bootstrap problem here, getting the first commit in; we'll ignore that.) Then you make the new commit, and its parent is the commit you started with. So the commits themselves form a simple backwards chain. A branch name just holds the hash ID of the last commit in the chain, like this:

... <-F <-G <-H   <--master

Here, the name holds the hash ID of the last commit, which has some big ugly hash ID but we'll just call it commit H. Commit H itself holds a snapshot and metadata (which we didn't cover properly) and includes the actual hash ID of earlier commit G. Commit G holds another snapshot and metadata, including the hash ID of still-earlier commit F, and so on.

We say that the commits point to each other—always backwards, because that's just how Git is—and that the branch name points to the last commit. To add a new commit to a repository, we create the new commit, pointing backwards to what was the last commit in the chain, then move the name forwards to point to the new last commit:

... <-F <-G <-H <-I   <--master

That's all it takes to add one commit to a repository: add the new commits—the existing commits never change; in fact, they cannot change—and update some name or names to find the new "last commits".

Branching happens when you build stuff like this:

             I--J   <-- br1
            /
...--F--G--H   <-- master
            \
             J--K   <-- br2

To get here, we added a br1 name also pointing to H and then added two commits, one at a time, advancing the name br1. Then we added a br2 name also pointing to H and then added two commits there. The commits up through H are on all three branches.

(Not shown here: how HEAD works, by attaching to just one branch name. That's your current branch, and then the commit that the branch name points-to is your current commit.)

Branch names are not the only kinds of names

Besides branch names, each repository can store other kinds of names. All the names go into the names-database. When necessary, Git can distinguish the kind of name by using the full name: a branch B has a full name refs/heads/B. The other two kinds of names that you'll deal with regularly are tag names, which have full names starting with refs/tags/, and remote-tracking names, which have full names starting with refs/remotes/.

The remote-tracking names are your own Git repository's way of remembering some other Git repository's branch names. If they have a branch B, and you use the name origin to reach their Git repository, your Git will store their branch name B using your remote-tracking name origin/B.

Just like branch names, all of these names just point to one commit. (Tag names have a complicated way of doing this, called annotated tags, while remote-tracking names are simple like branch names, because they're literally just copied from branch names. You don't need to worry about the extra complexity of annotated tags here.)

The key difference between a branch name and one of these other names, besides of course the prefix part of the full name, is that you can get "on a branch". When you do, and make new commits, Git will automatically update your branch name to point to your new commit. But when you work with a remote-tracking name, the way to use its commit gets you into what Git calls detached HEAD mode. That's because Git won't let you get "on" a remote-tracking name.

People don't like to work in detached HEAD mode (for good reason: it's not a good idea in general, it's for special cases like being in the middle of an unfinished rebase). So in general, when we're working with one or more remotes—other Git repositories as found by names like origin and upstream and whatever names you want to make up for them—we humans like to create corresponding branch names.

Clones and the standard naming conventions

The usual first remote name, as I mentioned earlier, is origin. You don't even have to set this one up yourself: just running git clone does it for you.

The git clone command is essentially just a convenience command. It runs five or six other commands for you, with the first one being a non-Git command and the rest being Git commands:

  1. mkdir, or whatever your OS uses to create a new empty directory (folder). The remaining commands run in this folder, although when they're done, you have to move to it manually (for OS-specific reasons).

    As a special case, you can tell git clone to build the clone in an existing empty folder, in which case it skips the mkdir step but still runs the Git commands in that folder. Unless it's . you generally have to chdir or cd into it afterwards.

  2. git init, which creates a new empty Git repository. This new repository has no commits and no branches.

  3. git remote add origin url: this adds your first remote, named origin. You can pick a different name with a command line option, but if you don't, you get the standard name. Git saves the URL under the name, and arranges for their branch names to become your origin/* remote-tracking names.

  4. git config, if and as needed: you can get extra configuration set before step 5, if necessary, using command line options.

  5. git fetch origin: this has your Git call up the other Git, using the stored URL. Their Git shows your Git all of their branches and other names, and your Git uses these to copy all their commits. Your Git does not create branch names though: instead, your Git turns their branch names into your remote-tracking names.

    When this step finishes, you have a repository that has all of their commits and none of their branches. Of course, the remote-tracking names are just as good as branch names, in terms of finding commits. But it's important to realize that all the names you have here are remote-tracking names.

  6. git checkout (or, in Git since 2.23, git switch—which does the same thing here). This creates one branch name. The name it creates is the one you told git clone to use, with your -b option. If you didn't specify a -b option, your Git asks the other Git which branch name they recommend, and uses that name.

It's this last step—step 6—of git clone that makes your first and only branch name, in your new clone. It's super-common to use the name master, although it's becoming more common (because of GitHub) to use the name main. The important thing here is that the name is a branch name that the other Git was using, that is now a remote-tracking name in your own repository. Your Git creates a new branch name for your repository, using the same spelling as their Git uses for their branch name. But your Git is using your remote-tracking name, set up in step 5, to do this.

You end up with the same name for your branch as for their branch, but your Git has gone through all this crazy shuffling, from branch names to remote-tracking names, back to a (single) branch name, to get there. You need to be aware of this, because you're going to have more than one remote.

More about git fetch

Our description above just said that git clone runs git fetch as step 5. It's the git fetch step that:

  • figures out which commits they have, that you don't, that you need;
  • gets their branch names and turns them into remote-tracking names;
  • gets their commits and adds them to your repository; and
  • creates and/or updates your remote-tracking names based on their branch name hash IDs.

For that first git clone, you had no commits, so this first bullet point is easy: you need all of their commits.1 But you'll run git fetch again later, and/or to other Git repositories. When you do, the set of commits that you need is often much smaller or even totally empty.

Remember that, earlier, I said that each commit gets a totally-unique hash ID. This hash ID is always the same in every Git repository. That means your Git can tell whether some other Git's commit is the same commit you already have, or not, just by looking at the hash ID. So when you run git fetch origin later, if they haven't added any new commits, your Git can very quickly find this out, and not bother getting any new commits.

Note that they can add new branch names without adding new commits. For instance, they can make two names point to the same commit. This is entirely normal: it is how new features get written. We start by making a new branch name that points to some existing "last commit". So your git fetch might create or update remote-tracking names without getting any new commits.

The fetch command's job, in other words, is two parts: get any new commits (commits they have, that I don't, that I want); and update my remote-tracking names (so that I remember which commits their branch names pick).


1There are some exceptions here, e.g., for shallow and/or single-branch clones, and there are lot of details I'm glossing over on purpose to keep the description simple. The reality is much more complicated.


Adding more remotes

To add a new remote to your existing Git repository, you will run git remote add name url. The name you pick here is arbitrary and is up to you. The use of upstream is kind of standard, but as I mentioned above, it's annoying in one way. Every Git branch in your own repository can have one upstream setting. You can set the upstream of a branch to whatever you want, whenever you want. This has little if anything to do with a remote named upstream, whether you have one or not.

I find it confusing to talk about the upstream of master and the remote named upstream at the same time. So I'm not a big fan of this as a standard second name, myself. But this part is up to you. Here, I'm going to call this second remote xyzzy, after the magic word in Colossal Cave Adventure:

git remote add xyzzy <url>

Having done this, you then need to run git fetch to it:

git fetch xyzzy

This will call up another Git, at the specified URL, as before. You'll get, from that Git, any commits they have, that you don't, that you need—this might be as few as no commits at all—and then your Git will create (or update, but this is the first time, so "create") remote-tracking names for each of their branch names. If they have a branch named master, you will get an xyzzy/master.

This breaks the special trick that clone used (though clone got there in time)

Note that at this point, you will probably have:

  • master: your own branch, matching origin/master at the moment;
  • origin/master: the remote-tracking name in your Git from origin's master;
  • xyzzy/master: the remote-tracking name in your Git from xyzzy's master
  • xyzzy/branch_eight: the remote-tracking name in your Git from xyzzy's branch_eight

The key issue here is that there are now two remote-tracking names that are from master, one being origin/master and the other being xyzzy/master. The special trick that git clone used in its step 6, to create your master from origin/master, stops working at this point.

That's not a problem for your existing master, which already exists. It's a problem for new names. Suppose that both origin and xyzzy have a branch named feature, and you decide that now, you would like to create a new feature based on one of those two. You run:

git checkout feature

and your Git complains. The problem is that it doesn't know whether to use origin/feature or xyzzy/feature. There are two names that could match.

This is just a minor headache. You can still use an automatic trick:

git checkout --track origin/feature

will create feature based on origin/feature. You give it the remote-tracking name, and Git figures out which part is the branch part (feature) on its own and creates your feature branch. Or, you can use:

git checkout feature origin/feature

which does exactly the same thing.2 The disadvantage is that you have to type the word feature twice. The advantage is that you can use a different branch name, e.g., git checkout feat37 origin/feature.

Both of these have, as a Git built-in thing, the side effect of creating the new branch with its upstream set to the remote-tracking name, in this case origin/feature.

This is also the case with your existing master. Its upstream is set to your own origin/master. Read on for what this all means, but before we get to that point, let's note one last problem.

Suppose you'd like to have your own feature corresponding to origin/feature and your own feature corresponding to xyzzy/feature too. (Let's assume these are two different features, and you are going to work on both of them.) You can't call both of these feature in your own Git repository. You'll be forced to make up a different local branch name for at least one of these. This isn't particularly harmful in and of itself, but it makes for a bunch of small headaches later. Just be aware of this for now, as the problem may never actually come up.


2Well, it does the same thing by default. There are configuration settings you can change, that make these slightly different.


What good is an upstream anyway?

Rather than answer the whole thing here, I'll just link to this: Why do I have to "git push --set-upstream origin <branch>"?

The big plus-es here are the ability to run git rebase, git merge, git fetch, and the convenience git pull commands without typing anything else into the command line. It also makes git push slightly more convenient—but for your particular use case, it might actually make git push slightly less convenient.

You will need to think about this (and/or experiment with setting the upstream, or unsetting it entirely if you like) and see what works for you as there's no universal best way here.

Your own use case

Given your setup, I'd recommend:

  • making sure your Git is at least 2.15, if not newer, if you're going to use git worktree;
  • cloning one of the two repositories and adding the other as another upstream;
  • using git worktree add to create extra work-trees for rebasing branches other than the current branch, if you find you need to do that.

Set or unset upstreams of various branches in whatever way works for you. Remember to use git fetch to get updates.

Consider configuring fetch.prune to true:

git config --global fetch.prune true

so that if some remote deletes some branch B, your git fetch to that remote, via name R, deletes R/B from your own remote-tracking name set. Without this, "stale" remote-tracking names build up: git fetch sees that they have a branch named tmp and creates, say, origin/tmp in your Git, then they delete their temporary branch tmp and your git fetch finds no tmp so does not create or update your origin/tmp. This leaves your old (stale) origin/tmp behind! The more remotes you have, the more likely this kind of stale-remote-tracking-branch clutter becomes.

Last, remember that Git repositories share commits (by their big ugly hash IDs), but each repository has its own branch names. The fetch operation gets commits and updates remote-tracking names; but a push operation sends commits, then asks their Git to set its branch name, which generally requires that your new commits add on to their existing commits.

torek
  • 448,244
  • 59
  • 642
  • 775