TL;DR: you'll use branches in a way that suits you. They don't really mean anything, so you're free to use them however you want.
Long
You're mixing up several things here, although to be fair, (a) it's complicated and (b) GitHub do you no favors here because they add even more complexity atop this already-complicated thing.
Git is a Version Control System. Version control systems (VCSes) have a vast (and somewhat boring) history; Git's flavor of VCS is distributed, or a DVCS, in which each repository can be replicated as many times as you like. Some DVCSes have a master/slave or single-source-of-truth setup ("SSOT"): there's one "real" version and a bunch of copies. Git does not use this model: every version is its own king or sovereign entity and all the replicas are inferior clones. This makes life interesting, in that your clone on your computer thinks the GitHub clone is inferior, while the GitHub clone on their systems think your computer is inferior. Most people like to simplify by designating one of the clones as the SSOT, and all others as mere copies that can be destroyed and re-created from the SSOT. You can of course do this with Git, as that's a subset of the Multiple Source of Truth setup.
GitHub is a web site that hosts Git repositories. Going to a github.com/user/repo.git
site displays the repository and its README.md
file.
GitHub Pages is a web site run by the same folks who run github.com
. They have, over time, changed how this works; the current version uses github.io
to access a github.com
repository and display it. But what gets displayed is not quite the same as what gets displayed on github.com
.
Still, you've asked about branches, which are a Git topic. GitHub will use branches, but the branches themselves are provided by Git itself. So let's look at what a branch is. It's ... less than you might want, perhaps; in fact, in Git, the word branch is so overused as to become meaningless (see also What exactly do we mean by "branch"?), and it helps to be more specific, e.g., to say branch name or tip commit instead of branch.
Git is about commits
The first key to using Git is to realize that Git is not about branches and is not about files. Instead, Git is all about commits. A commit holds files, and a branch name helps you (and Git) find a commit, but it's the commit itself that matters. This means you need to know, in at least some detail, exactly what a commit is and does for you.
Each commit, in Git:
Is numbered. Every commit has a unique hash ID, expressed in hexadecimal. In a sense, this hash ID is the commit: once some hash ID has been assigned to some commit you've made, that hash ID means that commit, forever, and in every Git repository in the universe, even those that aren't clones of yours.1 It cannot be re-used for any other commit. So two Git repositories, coming into contact with each other, can easily tell who has commits that the other lacks, just by comparing hash IDs.
Stores two things:
Each commit stores a full snapshot of every file. These files are in a special, compressed (sometimes highly compressed), read-only, Git-only format: only Git can read them and literally nothing, not even Git itself, can overwrite them. This allows them to be shared across (and even within) commits, which means Git can de-duplicate identical file content.
The de-duplication means that the repository doesn't grow enormously fat even though every commit has every file. In fact, if a new commit is has exactly the same snapshot as a previous commit—this isn't common, but you can make it happen—the files take literally no space at all, via the de-duplication trick. (The commit itself takes a bit of space because of the next point.)
Each commit stores some metadata: information such as your name and email address, and the date-and-time-stamp for when you made the commit. You get to provide a log message saying why you made the commit, too, which git log
will show you later.
Included in this metadata is something Git adds for its own purpose: every commit records a list of previous commit hash IDs, which Git calls the parent or parents of this commit. Most commits have just a single parent.
Is completely read-only / unchangeable. (This is true of all Git's internal objects and is a key to making the hashing trick work. The hashing trick is also how Git does the file-content de-duplication.)
The parent metadata, when put together, means that commits find earlier commits. Each commit points back to its immediate parent, which points back to the parent's parent (the commit's grandparent), and so on, like this:
... <-F <-G <-H
Here H
stands in for the actual hash ID of the latest commit. Commit H
, whatever its true hash ID is, stores both a full snapshot and some metadata. The metadata include the raw hash ID of earlier commit G
.
Git can therefore extract both commits and compare the files in G
and H
. Whatever is the same—and is therefore de-duplicated—is usually uninteresting; for files that are different, though, Git can now compute a recipe for changing the old version (in G
) into the new one (in H
). This is a (single-file) diff. Diffs are how we usually view the files in a commit, as diffs from the previous commit. So we see commit H
as its changes from its parent G
, when we run git log -p
. But H
actually stores a full snapshot.
Having shown H
, git log
now moves back to commit G
. This is, of course, a snapshot plus metadata, and the metadata allow Git to find earlier commit F
. So Git can now show G
as its log message and the diff with respect to commit F
. This repeats for every commit, one parent/child pair at a time, until Git gets back to the beginning of time: a commit that has no parents. Git shows all files as new, in this root commit, but otherwise it's the same as any other commit. Then git log
stops going backwards.
1This does not—can not—work forever, due to the pigeonhole principle. The huge size of the hash ID in Git is meant to make it work well enough, long enough, that we don't care about the eventual failure: it should ideally not happen until long after the universe ends.
How branch names help us find commits
To view commit H
—or to extract it so that we can get some work done—Git needs to know H
's hash ID. If we want commit G
, we can give Git H
's hash ID and say "and then go back one hop". If we want commit F
, we can give Git H
's hash ID and say "and then go back two hops". But we have to give Git H
's hash ID.
One alternative here is to memorize the hash IDs. That's no fun and—humans being such a big source of errors—a very bad idea, but instead of having humans memorize hash IDs, why not have the computer do it? We could have a little file or database where we write down rows of names and hash IDs: master
or main
means a123456
or whatever H
's hash ID is, for instance.
That is, in fact, what a branch name is: an entry in a small database. The commits themselves are entries in a bigger database. Both databases are simple key-value stores; the object database, with commits in it, uses hash IDs as keys, and the names database, with branch and tag and other such names in it, uses names. The values in the object database are the commits themselves (and the files and other internal objects), and the values in the names database are the hash IDs.
Curiously, though, a branch name stores only one hash ID. This is actually true for all entries in the names database: they all store just one hash ID. For a branch name, that hash ID is always the ID of a commit, and we simply define it as the latest commit. So if we have:
I--J <-- feature1
/
...--F--G--H <-- main
\
K--L <-- feature2
then the name main
selects commit H
as the "latest" commit, while the name feature1
selects J
as the latest and feature2
selects L
as the latest. All three commits are in fact latest, even though two of them are obviously "later".
Commit J
points back to commit I
, which points back to H
, which points back to G
, and so on. Commit L
points back to commit K
, which points back to H
and so on. What this means is that all commits up through and including H
are on all three branches at the same time. Commits I-J
are only on feature1
, and commits K-L
are only on feature2
.
Making new commits
Branch names are allowed to—indeed, encouraged to—move, and will do so automatically as we make new commits. We pick a branch to be "on", like this:
I--J <-- feature1
/
...--F--G--H <-- feature2 (HEAD), main
This means we're "on" branch feature2
, using commit H
. H
is the latest commit on both main
and feature2
, since both names point here. Git will extract, into a work area, all the files from commit H
.
Note that the files in Git are in a special weird Git-ized format, that only Git can use. But Git has copied these files out of the repository, into our work area. The copies—which are not actually in Git at this point—are ordinary everyday files; we can do work with them.
We do whatever we like with these files and run git add
(for reasons I won't cover here) and git commit
, and Git now prepares a new commit:
- Git gathers any metadata it needs, such as name and email address and log message.
- Git uses the current commit hash ID (
H
in this case) as the parent in the new commit's metadata.
- Git freezes all the files into the permanent-storage form (they're actually pre-compressed-and-de-duplicated so that this goes very fast, compared to old-school version control systems) to go into the new commit.
- Git actually writes out this new commit, which assigns it the unique hash ID—
K
in this case.
- The sneaky part: Git writes the new commit's hash ID into the current branch name, i.e., the one
HEAD
is attached-to.
The result is our new commit K
:
I--J <-- feature1
/
...--F--G--H <-- main
\
K <-- feature2 (HEAD)
K
is now our latest commit on branch feature2
. It got there automatically: Git made K
just now (from Git's index / staging-area, which we haven't covered, plus the metadata) and made it link back to H
, and then updated the name feature2
.
If we make another new commit now, we get the picture I drew earlier:
I--J <-- feature1
/
...--F--G--H <-- main
\
K--L <-- feature2 (HEAD)
How GitHub and GitHub Pages use branch names
We now know that a branch name just points to some commit. That commit has a bunch of files in it, stored as a (permanent, unchangeable) snapshot.
We—humans—use branches (or branch names, to be precise) for whatever purpose we like, but we do need to be aware of how some software might use them. When you view a repository on GitHub, they show you the README.md
from the main branch. By default, that's the literal name main
. So you'd want your main
to point to a commit that has the right README.md
file in it.
You can change which branch name is considered the "main" one, on GitHub, using the GitHub web interface.
When you view the same repository on github.io
, they will show the index.html
or README.md
from a commit selected by some branch name. Which one? Well, the default is main
again, but again, you can change it. It looks like you can change this independently of the setting on github.com
(although I have not tried this myself).
Know that they'll show files out of some commit. You pick a branch name; that picks some commit in their repository; and then you see files from that commit's snapshot. Then, remember one other key item about Git: their branch names are theirs. Your branch names are separate from theirs. You may, if you wish, use the name fred
to remember a commit, but have them—GitHub—use a different name (wilma
or barney
perhaps) to remember that same commit. To get GitHub to:
- store some particular commit (and all its parents/ancentry), and
- remember that commit with some branch name
you will use git push
. The git push
command will send commits from your repository—which it will usually find using your branch name(s)—to their repository, now using the raw hash IDs so that your Git and their Git can tell which commits they already have, and which ones they still need. Then your Git ends this git push
session with a request to their Git: Please, if it's OK, set your name ________ to point to hash ID ________. They'll tell your Git if they did that. (If not, you can convert this to a forceful command with git push --force
, but in many cases this is a mistake: instead, you want to figure out why they didn't agree to do that.)
If you use the same branch names in both repositories, this simplifies your job a lot. So you'll probably want to do that. It's not required though, and until you use git push
to send commits to them, they won't have your new ones.
Similarly, you can use git fetch
to get commits from some other Git repository. But there's an oddity here: when you use git fetch
, you get commits from another Git, but this doesn't make your Git update your branch names. Instead, your Git has a whole separate set of names, which I call remote-tracking names, to remember the other Git's branch names. Git calls these remote-tracking branch names, but that poor overloaded word branch has nearly lost all its meaning by this point and I find "remote-tracking names" works as well or better.