First, relax a bit: git branch --set-upstream-to=target branch
just sets the upstream of the branch named branch
to target
. The upstream of a branch doesn't actually change anything about the branch. What it does is change some reporting, and enable some shortcuts.
If you acted on the reports, that matters. If not, it does not.
If you used the shortcuts, that matters. If not, it does not.
To remove the upstream of branch B:
git branch --unset-upstream B
To change the upstream of branch B to origin/B
:
git branch --set-upstream-to=origin/B B
I thought this would sync my branch with master however I somehow put the all the changes that I had my in my branch to the master branch.
Again, this depends on any other commands you ran. The --set-upstream-to
option of git branch
didn't do anything other than set the upstream. For a discussion of what an upstream is about—what it's good for—see my answer to Why do I have to "git push --set-upstream origin <branch>"?
So my questions are how can I revert all the files on the master to the state they were in a previous commit? As well as how do I properly sync so that I can get updated changes from master into my branch?
Answering this is hard, because I think you're starting with some wrong assumptions. This is very easy to do, because Git's system of branching is, well, downright weird. If you've never used any other version control system before, you might not find it that odd, but if you have, Git can come as a big shock.
People like to think of branches as something real, something solid: useful, maybe immobile like a huge boulder, maybe sheltering like a building, and in any case, solid and reliable. In other version control systems, they tend to be that way. In Git, they're not: they're lightweight, ephemeral, fluid, and to a large extent, almost useless, in many ways. They mostly do one thing for us, and they do that pretty well, but they don't do most of the other things we expect branches to do. But here I'm talking about branch names, like master
and develop
and so on. The word branch in Git is actually ambiguous: it doesn't always mean branch name. The other thing it means is pretty solid.
(If you've never used any other version control systems, none of the above means very much, because you won't have prior expectations.)
Git is all about commits
Forget about branches for a bit here. Git isn't so much about branches. We use them, sure—they do something useful—but Git is really all about commits. The commit is the basic unit in Git, and is the thing you need to know the most about. So let's look at what a commit is and does.
A Git commit holds data—a snapshot of all of your files—and metadata: information about the commit. That's what's in a commit: data and metadata; a snapshot of files, and some information about the snapshot, such as who made it and when. Commits don't hold changes. They just hold snapshots, plus that metadata.
Once made, no commit can ever be changed. Not one part of any commit can ever change. All of its files are frozen for all time. This has a bunch of useful properties. For instance, some people object to the fact that Git makes a new snapshot of every file for every commit. Won't this make the repository get terribly fat, terribly fast? Well, if Git did it the dumb way, it would; but Git doesn't, so it doesn't.
Let's say you have a hundred files. You made a commit yesterday, with all 100 files in it. You only changed two, and made a new commit just now. 98 of the files in the new commit match those in the previous commit. Well, we just said that every commit is totally frozen—so your new commit can just share all 98 of the unchanged files. It only really needs to snapshot the two different files.
The frozen files in each commit are further compressed, because they're in a special, read-only, Git-only form. They're literally unchangeable in this form: it's not just you that can't change them. Neither can Git itself. This is great for archival, and it means that one answer to one of your questions is trivial:
how can I revert all the files ... to the state they were in a previous commit
They're already in that state, in that commit. All you need to do is use that previous commit. But before we get to that, let's finish up dealing with what commits have in them.
Every commit gets a unique number. A Git commit has a hash ID, and once you make a commit, that hash ID is forever reserved to mean that commit. In a sense, it was reserved even before you made the commit. However, the commit's hash ID is constructed by doing a cryptographic hash over all of the commit's data-and-metadata, and the metadata includes the exact second at which you create the commit, so we'd have to know in advance when you were going to create it, as well as everything else you were going to put into it, to predict its hash ID.
Every Git everywhere agrees that that commit gets that hash ID. This means if you connect two Gits to each other, they can just look at each other's hash IDs: if your Git has their hash ID H1, your Git has their commit H1. If they have your hash ID H2, they have your commit H2. Wherever you have a hash ID they don't, or vice versa, you have a commit they don't, or vice versa.1 This makes exchanging commits pretty efficient: your Git knows which commits they have, and vice versa, just by looking at some hash IDs.
The last, but super-important, thing to know here is that every commit stores the hash ID (or multiple hash IDs) of its immediate ancestor: its parent (or parents, in the case of merge commits). Note that this linkage goes only one way, from child to parent. That's because when the child is "born"—when we make a new commit—we know what parent(s) we want to use. But when we make a commit, we don't yet know what hash IDs it will have as children. And, every part of every commit is completely, totally, 100% read-only, so we can't add the child hash IDs later.
1Git uses hash IDs for all four of its internal object types, not just commits, so this is a little bit off, technically. The comparison is just hash ID, not commit hash ID; you have the object, or don't, if you have the hash ID, or don't.
So, Git stores commits, which:
- are snapshots of all your files, frozen forever,
- plus some metadata, such as who made the commit, when, why, etc.,
- and each one has a unique "number" (hash ID); and
- each one points back to its immediate parent, by storing the parent's hash ID.
These pointing-back links mean we can draw commits. Let's start with a tiny repository with just three commits in it:
A <-B <-C
Commit C
is the last commit, i.e., the one we made most recently. It holds the hash ID of earlier commit B
, so B
is C
's parent: C
points to B
. Meanwhile B
holds the hash ID of earlier commit A
: B
points to A
. Commit A
was our very first commit. There is no earlier commit to point to, so it just doesn't.
Here, we're using single uppercase letters to stand in for commits. But actual hash IDs are big and ugly and impossible for humans to work with. They have to be big, so that you can get a unique one, different from every other commit hash ID, ever, every time you make a new commit. So how will we remember that commit C
is the last commit?
This is where branch names come into the picture. In Git, a branch name like master
just holds the actual hash ID of the last commit:
A--B--C <-- master
(I've stopped drawing the internal arrows between commits as arrows, because it gets too hard. Just remember, the arrows all point backwards; Git works backwards.)
If we have more than one branch name, each one just holds one hash ID. Multiple names can hold the same hash ID:
A--B--C <-- master, develop
Now we need a way to know which branch name we'd like Git to use, so we attach the special name HEAD
, in all uppercase like this, to one of the branch names (just one at a time):
A--B--C <-- master (HEAD), develop
Here, we're using commit C
, but on master
. If we now git checkout develop
, we switch to commit C
—i.e., we don't switch anything—but we also switch to using the name develop
, so that we're on develop
:
A--B--C <-- master, develop (HEAD)
If we now make a new commit, in the usual way—which I won't describe here—we get a new commit with a new big ugly hash ID. Let's just call it D
. Commit D
gets a new snapshot of all of our files, even if that just re-uses most of them from C
(and note that if we change a file back, we'll re-use the copy from an older commit, so maybe D
re-uses most of C
and re-uses A
or B
for the last few files). It gets its own metadata, including our name as the person who made it (the committer) and wrote it (the author),2 and it stores commit C
's hash ID as its parent. This is easy to do because the branch name develop
has C
's hash ID in it right now.
Having made commit D
, we now have:
A--B--C <-- master, develop (HEAD)
\
D
and we're now at the last step of git commit
, which is: it just writes whatever hash ID it just got, for the new commit, into the current branch name, i.e., the one to which HEAD
is attached. This causes the branch name to move:
A--B--C <-- master
\
D <-- develop (HEAD)
If we now git checkout master
and make a new commit, we'll get a new commit E
which will point back to C
, and the name master
will move:
E <-- master (HEAD)
/
A--B--C
\
D <-- develop
Note that we can, if we like, draw commit D
or E
on the same line as the first three commits. The important thing is that we connect D
back to C
, and E
back to C
, and C
to B
to A
.
The graph and the commits are real, and reasonably solid—the graph is flexible as long as we bend it without breaking any commits off—but the branch names are mere labels. We can, at any time, tell Git: take the label develop
off commit D
. We can make it point to any commit that we have. We can even delete it entirely, though if we do that, we'll have a hell of a time finding commit D
—its hash ID looks totally random, after all.
If we can't find D
, Git will eventually remove it. (Git has some safety measures to find temporarily-abandoned commits, usually for at least 30 days, in case you goof up.) So commits aren't necessarily forever. But once made, they can't be changed. As long as you still have D
and can find its hash ID, you have it, including all of its snapshot-files.
Note that we can find commits A-B-C
in two ways at this point: we can start from master
and find E
and use that to find C
and then B
and then A
; or we can start from develop
, find D
, and use that to find C
and then B
and then A
. This means that commits A-B-C
are on both branches. (This makes Git very different from most version control systems.) If we delete the name develop
, and can't find D
, we can still find A-B-C
and now they're only on one branch. Or, we can add a new name:
E <-- master (HEAD)
/
A--B--C <-- three
\
D <-- develop
and now commits A-B-C
are on three branches, including the one named three
. Commit C
is the last commit of three
, and is part of—but not the last of—both master
and develop
.
This is what it means when I say that branches don't really mean very much. What they do is allow us to find the last commit, and from there, we find all the earlier commits. If we have another way to find those commits, the only thing the branch name is doing is remembering that some commit—like C
—is the tip of that branch.
2This separation between author and committer allows Git to work with email, where someone just emails you patches that you apply. You're then the committer and the other person is the author. The Linux folks used Git like this in the early days of Git, and sometimes still do.
How you use all of this
Checking out a branch name means select that branch's tip commit as the current commit and select that branch as the current branch. Git does this by:
- attaching the special name
HEAD
to the branch name; and
- extracting the frozen, read-only files from the chosen commit, into an area where you can see and work on them.
So, if you have:
...--F--J--K <-- master
\
G--H--I <-- branchname (HEAD)
and you've accidentally done something to add new commits to branchname
that you don't want, such as adding commit I
, you can just force Git to point the name branchname
back to existing commit H
:
...--F--J--K <-- master
\
G--H <-- branchname (HEAD)
\
I [abandoned]
Commit I
will stick around for a while—probably at least 30 days—but be invisible; you won't see it. Once it's been hanging around too long, the maintenance garbage collector, git gc
, will eventually remove it for real, and it will really be gone.3
To do this, you have to tell Git to forcibly re-set the name branchname
to point to commit H
instead of commit I
. There are multiple Git commands that can do this, but the main one you generally use for this is git reset
. This also wipes out any uncommitted work—well, depending on how you use it—and uncommitted work cannot be recovered for 30+ days, or for any days at all, so be very careful with git reset --hard
:
git reset --hard <hash-of-H>
will do what we drew above, making commit H
the current commit, and the name branchname
point to commit H
. Commit I
will still exist, but will be very hard to find again.4
3This assumes you didn't let some other Git copy commit I
by its hash ID, into their repository, and thus grab it and keep it under some name of their choice. If you did, they could later introduce commit I
back to your repository.
4If you do decide you want it back, the usual way to find its hash ID is through Git's reflogs, which keep track of what hash IDs were in each ref. But that's for another StackOverflow question.
Getting changes into another branch
As well as how do I properly sync so that I can get updated changes from master into my branch?
Remember, branches aren't all that important. It's commits that matter. And, commits are snapshots. They're not changes! You can turn a commit into changes, though. Pick any two adjacent snapshots—parent and child—and ask Git: what's the difference in the parent snapshot, vs the child snapshot? That is:
...--G--H <-- master (HEAD)
If we have Git extract snapshots G
and H
into two temporary working areas, then compare each file in the two temporary areas, we'll see what changed. So that's what git log -p
, git show
, and git diff
do. The git log -p
case takes each commit it's showing, as it goes backwards showing one commit at a time, and compares its parent to it—then goes on to show the parent, and so on. The git show
command takes one commit to show, shows it by comparing its parent to it, and stops. With git diff
, you can give it any two commit hash IDs; it extracts both commits, and compares them.5 It doesn't look at any commits in between: you just pick a left and right side, and compare. You can compare your very first commit to your latest, if you like.
Two very useful Git commands are:
git merge
, which finds three commits, and does two git diff
s and then combines changes; and
git cherry-pick
, which in effect copies a commit by comparing it to its parent, to make a set of changes out of it.6
When to use git merge
, when to use git cherry-pick
, and when and whether to use git rebase
—which essentially runs a whole series of git cherry-pick
commands for you—is another topic, and quite a big and often opinionated one. I won't go into these details here. Let's just show a real merge. This occurs when you have two branches—this is where Git's ambiguity with the word branch is a problem—that looks like this:
I--J <-- branch1, merge-me (HEAD)
/
...--G--H
\
K--L <-- branch2
Note that our current branch name merge-me
points to commit J
, and branch2
points to commit L
. We can now run git merge branch2
. Git will find commit J
easily—that's our current commit—and find L
easily, because the name branch2
points to L
. Then Git will find the best shared commit, which Git calls the merge base. In some cases it's not clear, but here, the best shared commit is obviously commit H
(whatever its actual hash ID is): commit H
is on both branches, and is the most recent such one, later than every other commit that's also on both branches.
So git merge
will now git diff
commit H
vs J
, to see what we changed on our branch. It will also git diff
commit H
—the merge base, again—vs commit L
, to see what they changed on their branch. Then, Git will combine these changes. If we changed file F in snapshot H
to a new version of F in J
, Git takes our changes. If they also changed file F in snapshot H
to a new version on L
, Git takes their changes too. The merge code combines the changes, if it can, and applies the combined changes to file F from snapshot H
.
This repeats for every changed file. For files where nobody changed anything—file F2 in H
matches the one in J
and the one in L
—Git can take any of the three versions, because they all match. For files where only one side changed anything, Git can short-cut: it can just take ours, or just take theirs.
The two diffs are run with --find-renames
, to look for renamed files, and the diffs automatically look for files that were added or deleted. The merge code combines these, too, or at least, as well as it can.
If our changes and their changes overlap, but aren't exactly the same, the merge code will declare a merge conflict. In this case, Git will leave all three input commit copies of the file around.7 Git will also write its best effort at combining, with conflict markers where it had trouble, into your work-area. Your job becomes resolve these conflicts by producing the correct merge result. (You must then continue / finish the merge yourself.)
If Git doesn't detect any conflicts, though, Git will go on to make a new commit from the result. Remember, this is the result of combining all the changes, and applying the combined changes to whatever was in the merge base—snapshot H
, in our example:
I--J <-- branch1
/ \
...--G--H M <-- merge-me (HEAD)
\ /
K--L <-- branch2
New commit M
is a merge commit: it points back to J
, as any commit would, but also to L
, the other commit that got merged. (The fact that the merge used H
as the merge base is not recorded anywhere. If you ask Git what was the merge base? Git will just have to figure it out again, the same way it figured it out before M
existed.) The new commit M
caused the current branch to advance to point to it, just as new commits always do. Commit M
has a snapshot—not changes—just like every other commit. The only thing special about M
is that it has two parents, instead of just one. When Git works backwards through commits, one at a time, it must go from M
to both J
and L
.8
5Because of the internal storage format, Git doesn't actually have to extract anything up until it gets to the point where two files really don't match. In general, it doesn't need a temporary work area, it just does all this right in memory.
6Internally, Git actually implements git cherry-pick
using the same three-way merge that git merge
performs, but with the parent of the commit-to-copy as the merge base. Afterward, instead of making a merge commit—a commit with two parents—git cherry-pick
makes an ordinary single-parent commit. But this explains why rebase is harder than merge: if you rebase a five-commit chain, you're actually doing five merges, instead of just one.
7These are in Git's index aka staging area, which I didn't go into here.
8This creates some interesting problems, since Git is going one at a time. But again, these are for other postings.