What you really need is a good tutorial. I'm not sure which one(s) to recommend—Git is hard, and there are a lot of bad tutorials, many of which start out good and/or have good intentions, but eventually run into the hard parts. :-)
What you need to know at this point, though, is that branches—or more precisely, branch names—don't really mean much. Git is really all about commits. Until you make a new commit (or manipulate existing commits in some way), you have not done anything in Git itself.
One thing about commits, though, is that they are frozen for all time. You literally cannot change anything inside any commit. Each commit stores a complete snapshot of all of the files, in a special, read-only, Git-only, frozen format. This means they're great for archiving, but completely useless for doing any new work.
For this reason, Git gives you an area in which you can do work. This area is called (variously) the working tree, or the work tree, or the work-tree (I like the hyphenated term myself), or any number of other similar names. Here, files are just ordinary files. That means you can work with them—hence the term working tree. When you do work with them, Git mostly doesn't care: this area is for you, it's your work-tree. Git just fills it in from commits if/when necessary.
The index
Making a new commit in Git is tricky. Other version control systems are much simpler, because in these other systems, your working area is also your proposed next commit. This is not the case in Git! Git adds one more thing you must know about, even if you don't really want to. This thing is super-important, and exposed to you, even though you can't see it.
Git calls this thing the index, or the staging area, or sometimes—rarely these days—the cache. All three names can refer to the same thing. It gets used in different ways (and the "cache" term is now mostly meant and used for an internal data structure, which is why it's kind of rare now), but a pretty decent short description of the index is that it holds your proposed next commit. You can think of it as holding a copy of every file from the current commit.1
When you change a file in your work-tree, nothing happens to the copy in the index. It still matches the copy in the commit you chose. You have to run git add
to copy the file from the work-tree, to the index. Now the index copy no longer matches the committed copy, so that you have proposed that the next commit be different from the current one.
Running git commit
builds a new commit from whatever is in the index right now. So in Git, you work in the work-tree, then use git add
to copy the updated files back into the index, then use git commit
to make a new commit from the index. This is kind of a pain and is why other systems don't have an index: they dont make you update an in-between copy of all your files. But Git does, and it's best to get used to it and familiar with it. There are some tricks to try to hide it,2 but they eventually fail: some things in Git can only be explained by pointing to the index.
Having made a new commit from the index, that new commit becomes the current commit. Now the current commit and the index match. This is also the normal situation right after a git checkout
: the current commit and the index normally match. See below for an exception.
1Technically, the index holds a reference to an internal Git blob object. However, thinking of the index "copy" of a file as a true independent copy works fine for most purposes—it's only when you start getting into Git internals that you have to know about blob objects.
2For instance, you can use git commit -a
instead of git commit
. This just runs git add -u
for you. The add -u
step tells Git: For all files that are already in the index, check to see if they could stand to have a git add
done on them. If so, do it now. The commit then uses the updated index. There's some extra complications here too, but they only show up if the commit step itself fails. Still, they can only be explained properly, when they do show up, by knowing about the index.
Checking out another branch while you have uncommitted changes
When you git checkout
some particular commit—as found by some particular branch name—Git will fill in your index and work-tree from that commit. This may update some files—in both the index and the work-tree—and leave others alone, if they're the same in both the old commit and the new one.
If you made some change to your index and/or work-tree and didn't commit, though, Git will try, if possible, to leave that modification in place. This is what you have been seeing. In this case, your current commit and index don't match. (What happens in the work-tree is even more complicated, in some cases. For way too much information about this, see Checkout another branch when there are uncommitted changes on the current branch.)
When you do make a new commit, the branch name changes in an interesting way
Every commit, in Git, has a unique hash ID. This hash ID is a big ugly string of letters and numbers. Technically, it's the hexadecimal representation of an SHA checksum of the contents of the commit; but the main thing about it is that every Git everywhere will agree that this commit gets this hash ID, and no other commit can have that hash ID. Every other commit has some other hash ID.
The hash IDs look random, and are impossible for humans to remember. The computer can remember them for us. This is what branch names are really about.
Remember that we said above that all commits are frozen for all time. This is not true for branch names though; if it were, the names would be far less useful.
A branch name, in Git, just holds one commit's hash ID. That commit is, by definition, the last commit on the branch.
Every commit holds some set of previous commit hash IDs. Most commits hold exactly one hash ID. This one hash ID, inside this one commit (along with the snapshot of all files), is the parent commit of this commit.
Whenever one Git item—a branch name, or a commit—holds the hash ID of a Git commit, we say that the item points to the commit. So a branch name like master
points to a commit. That commit points to its parent. Its parent points to another parent, and so on.
If we use uppercase letters to stand in for the big ugly hash IDs, we can draw all this out:
... <-F <-G <-H <--master
The name master
holds hash ID H
. H
is the last commit. Commit H
points back to its immediate parent G
, by containing the hash ID of commit G
. Commit G
therefore points back to its parent F
, which points back yet again, and so on.
This all goes on, with these backwards-pointing arrows, until we reach the very first commit ever. It doesn't point any further back, because it can't. So that's where the action finally stops. Hence this drawing:
A--B--C--D--E--F--G--H <-- master
represents a Git repository with eight commits, each with its own unique hash ID, and one branch name, master
.
We can add another branch name, also pointing to commit H
, like this:
git branch develop
git checkout develop
Now we need to draw in a way to remember which branch name we're using. To do that, let's attach the special name HEAD
to one of the two branch names:
...--F--G--H <-- master, develop (HEAD)
Note that all eight commits are on both branches. (This is unusual: most version control systems don't work this way.)
Now let's make a new commit, in the usual way: change some file(s) in the work-tree, use git add
to copy them into the index, and run git commit
.
What Git will do now is package up the files that are in the index—they're already in the frozen format, ready to be committed—into a new commit, put our name and email address and so on into the new commit, and compute the new, unique, universal-across-all-Gits-everywhere hash ID for this new commit. We're the only Git with this commit, but our hash ID now means this commit, and none other, ever.3 Let's call this commit I
, though, for short. Git writes out commit I
with commit H
as its parent:
...--F--G--H <-- master, develop (HEAD)
\
I
The last step of git commit
is the tricky part: Git now writes I
's hash ID into the name to which HEAD
is attached. In this case that's develop
:
...--F--G--H <-- master
\
I <-- develop (HEAD)
and now develop
points to commit I
. The commits up through H
, which were on develop
before, are still there on develop
. The name develop
selects commit I
specifically, though. Git can now start at I
and work backwards to H
, then G
, then F
, and so on—or it can start at master
to find H
, then work backwards to find G
, then F
, and so on.
This is what it means for commits to be on a branch. The branch name identifies the last commit. Git then uses the internal, backwards-pointing, connecting arrows from one commit to its parent(s) to find the previous commit(s), and just keeps doing that until it gets to a commit that does not go back any further.
Each commit stores a snapshot—a complete copy of all of the files that were in the index at the time whoever made the commit, made it—plus this metadata: who made it and when; the parent hash ID(s) (two or more for a merge commit); and a log message, in which whoever made the commit should tell you why they made that commit.
Because each commit has a unique hash ID, and all Gits in the universe agree that that hash ID means that commit, you can connect two Gits together and they can just examine each other's hash IDs to see who has which commit(s). One Git can then give the other Git any commits that the one has, that the other wants and doesn't have. This uses a lot of CS graph theory and other tricks—such as delta encoding—to enable the sending Git to send a minimal amount of actual data to the receiving Git, so that even though every commit has a full snapshot of all files, the sender only sends changes to the receiver.
3As you might imagine, this makes the hash ID computation the real source of magic in Git. It's a bit tricky but it really does all work in practice. There is a potential for hash ID collisions but it's never been a real problem yet. See also How does the newly found SHA-1 collision affect Git?
Summary:
- A repository is a collection of commits, and of some set of names.
- The commits are identified by hash IDs. Each holds a snapshot of files, plus metadata.
- Each branch name or other name holds the hash ID of one commit. That's the last commit in the chain.
- Each commit holds, in its metadata, the hash ID of some number of previous commits. At least one commit has no previous commit, because it was the first commit ever. Most of the rest have one: their one previous commit. Merge commits have two or more previous commits.
- The commits are frozen forever, but the branch names—which pick out a last commit—move over time. To add a new commit, you—or Git—make it so that it points back to a previous commit, then move some branch name.
- Transfers (
git fetch
and git push
) involve connecting two Gits and having them figure out which commits they share, and which ones the sender is going to send. The receiver will eventually have to save the last hash ID somewhere, so that the receiver can find those commits again later, but we haven't covered how this works.
- Meanwhile, the index or staging area is where you build a new commit. You can't see what's in it—not directly and easily anyway—but
git status
, which we haven't covered here, will compare what's in it, and can tell you about these things. Your work-tree or working tree is where you can see and work with your files. You have to copy them back into the index/staging-area in order to make new commits that hold the new snapshots of the updated files. Until you do, all you're doing is changing your working tree copies of the files.