Those new to Git often think that Git stores changes in branches. This is not true. In your case, though, I think what you are running into is the fact that when you do work in a Git repository, you do so in what Git calls your working tree. Anything you do here is not in Git (yet).
You might want to use git worktree add
to deal with your particular situation. We'll get to that after covering how Git handles all of this, because it won't make any sense without a lot of basics.
The way I like to explain this is that Git does not store changes at all, and does not really care about branches. What Git stores, and cares about, are commits. This means that you need to know what a commit is and does for you, how you find a commit, how you use an existing commit, and how you make a new commit.
What commits are
The basic entity that you will use, as you do work using Git, is the commit. There are three things you need to know about a commit. You just have to memorize these as they are arbitrary: there's no particular reason they had to be done like this, it's just that when Linus Torvalds wrote Git, these are the decisions he made.
Each commit is numbered.
The numbers, however, are not simple counting numbers: we don't have commit #1 followed by commits 2, 3, 4, and so on. Instead, each commit gets a unique, but very big and ugly, number expressed in hexadecimal, that is between 1 and something very large.1 Every commit in every repository gets a unique, random-looking number.
It looks random, but isn't. It's actually a cryptographic checksum of the internal object content. This peculiar numbering scheme enables two Gits to exchange content by handing each other these large numbers.
A key side effect of this is that it's physically impossible to change what's in a commit. (This is true of all of Git's internal objects.) The reason is that the hash ID, which is how Git finds the object, is a checksum of the content. Take one of these out, make changes to its content, and put it back, and what you get is a new commit (or new other internal object), with a new and different hash ID. The existing one is still in there, under the existing ID. This means not even Git itself can change the content of a stored commit.
Each commit stores a full snapshot of every file.
More precisely, each commit stores a full copy of every file that Git knew about at the time you, or whoever, made the commit. We'll get into this "knew about" part in a bit, when we look at how to make a new commit.
These copies are read-only, compressed, and stored in a format that only Git itself can read. They are also de-duplicated, not just within each commit, but across every commit. That is, if your Git repository had some particular copy of a README
file or whatever, stored in some commit, and you ever make a new commit that has the same copy of the file—even under some other name—Git will just re-use the previous copy.
And, each commit stores some metadata.
The metadata with a commit include the name and email address of the person who made that commit. Git gets this from your user.name
and user.email
setting, and simply believes that you are whoever you claim to be. They include a date-and-time stamp of when you (or whoever) made the commit.2 The metadata also include why you (or whoever) made the commit, in the form of a commit message. Git isn't particularly strict about what goes into the message, but they should generally look a lot like email, with a short one-line subject, and then a message body.
One part of this metadata, though, is strictly for Git itself. Each commit stores, in its metadata, the commit number of the previous commit.3 This forms commits into simple backwards-looking chains:
... <-F <-G <-H
Here, each of the uppercase letters stands in for some actual commit hash ID. Commit H
, the most recent one, has inside it the actual hash ID of earlier commit G
. When Git extracts earlier commit G
from wherever it is that Git keeps all the commits, commit G
has inside it the actual hash ID of earlier-than-G
commit F
.
We say that commit H
points to commit G
, which points to commit F
. Commit F
in turn points to some still-earlier commit, which points to another earlier commit, and so on. This works its way all the way back to the very first commit ever, which—being the first commit—can't point backwards, so it just doesn't.
This backwards-looking chain of commits in a Git repository is the history in that repository. History is commits; commits are history; and Git works backwards. We start with the most recent, and work backwards as needed.
1For SHA-1, the number is between 1 and 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,975. This is ffffffffffffffffffffffffffffffffffffffff
in hexadecimal, or 2160-1. For SHA-256 it's between 1 and 2256-1. (Use any infinite-precision calculator such as bc
or dc
to compute 2256. It's very big. Zero is reserved as the null hash in both cases.)
2Actually, there are two user-email-time triples, one called "author" and one called "committer". The author is the person who wrote the commit itself, and–back in the early days of Git being used to develop Linux—the committer was the person who received the patch by email and put it in. That's why the commit messages are formatted as if they were email: often, they were email.
3Most commits have exactly one previous commit. At least one commit—the very first commit—has no previous commit; Git calls this a root commit. Some commits point back to two earlier commits, instead of just one: Git calls them merge commits. (Merge commits are allowed to point back to more than two earlier commits: a commit with three or more parents is called an octopus merge. They don't do anything you couldn't do with multiple ordinary merges, but if you're tying together multiple topics, they can do that in a sort of neat way.)
Branch names are how we find commits
Git can always find any commit by its big ugly hash ID. But these hash IDs are big, and ugly. Can you remember all of yours? (I can't remember mine.) Fortunately, we don't need to remember all of them. Notice how, above, we were able to start with H
and work backwards from there.
So, if commits are in backwards-pointing chains—and they are—and we need to start from the newest commit in some chain, how do we find the hash ID of the last commit in the chain? We could write it down: jot it down on paper, or a whiteboard, or whatever. Then, whenever we make a new commit, we could erase the old one (or cross it off) and write down the new latest commit. But why would we bother with that? We have a computer: why don't we have it remember the latest commit?
This is exactly what a branch name is and does. It just holds the hash ID of the last commit in the chain:
...--F--G--H <-- master
The name master
holds the actual hash ID of the last commit H
. As before, we say that the name master
points to this commit.
Suppose we'd like to make a second branch now. Let's make a new name, develop
or feature
or topic
or whatever we like, that also points to commit H
:
...--F--G--H <-- master, solution
Both names identify the same "last commit", so all the commits up through H
are on both branches now.
The special feature of a branch name, though, is that we can switch to that branch, using git switch
or, in Git predating Git 2.23, git checkout
. We say git checkout master
and we get commit H
and are "on" master
. We say git switch solution
and we also get commit H
, but this time we are "on" solution
.
To tell which name we're using to find commit H
, Git attaches the special name HEAD
to one (and only one) branch name:
...--F--G--H <-- master, solution (HEAD)
If we now make a new commit—we'll look at how we do that in a moment—Git makes the new commit by writing it out with commit H
as its parent, so that the new commit points back to H
. We'll call the new commit I
, although its actual number will just be some other big random-looking hash ID. We can't predict the hash ID because it depends on the exact second at which we make it (because of the time stamps); we just know that it will be unique.4
Let's draw the new chain of commits, including the sneaky trick that Git uses:
...--F--G--H <-- master
\
I <-- solution (HEAD)
Having made new commit I
, Git wrote the new commit's hash ID into the current branch name, solution
. So now the name solution
identifies commit I
.
If we switch back to the name master
, we'll see all the files as they were in commit H
, and when we switch back to solution
again, we'll see the files as they were in commit I
. Or, that is, we might see them that way. But we might not!
4The pigeonhole principle tells us that this will eventually fail. The large size of hash IDs tells us that the chance of failure is minute, and in practice, it never occurs. The birthday problem requires that the hash be very large, and deliberate attacks have moved from a purely theoretical issue with SHA-1 to being something at least theoretically practical, which is why Git is moving to larger and more-secure hashes.
Making new commits
It's time now to look more closely at how we actually make new commit I
above. Remember, we mentioned that the data in a commit—the files making up the snapshot—are completely read-only. The commit stores files in a special, compressed, read-only, Git-only format that only Git itself can read. This is quite useless for doing any actual work.
For this reason, Git must extract the files from the commit, into some sort of work area. Git calls this work area your working tree or work-tree. This concept is pretty simple and obvious. Git just takes the "freeze-dried" files from the commit, rehydrates or reconstitutes them, and now you have usable files. These usable, work-tree copies of the files are of course copies. You can do anything you want with them. None of that will ever touch any of the originals in the commit.
As I mentioned at the top of this, these work-tree copies of your files are not in Git. They are in your work area. They are your files, not Git's. You can do anything you want to or with them. Git merely filled them in from some existing commit, back when you told Git to do that. After that, they're all yours.
At some point, though, you would probably like Git to make a new commit, and when it does that, you'd like it to update its files from your files. If Git just re-saved all of its own files unchanged, that would be pretty useless.
In other, non-Git, version control systems, this is usually really easy. You just run, e.g., hg commit
in Mercurial, and Mercurial reads your work-tree files back, compresses them into its own internal form,5 and makes the commit. This of course requires a list of known files (and, e.g., hg add
updates the list). But Git doesn't do that: that's too easy, and/or maybe too slow.
What Git does instead is to keep, separately from the commits and from your work-tree, its own extra "copy" of each file. This file is in the "freeze-dried" (compressed and de-duplicated) format, but isn't actually frozen like the one in a commit. In effect, this third "copy" of each file sits between the commit and your work-tree.6
This extra copy of each file exists in what Git calls, variously, the index, or the staging area, or—rarely these days—the cache. These three names all describe the same thing. (It's mainly implemented as a file named .git/index
, except that this file can contain directives that redirect Git to other files, and you can have Git operate with other index files.)
So, what Git does when you switch to some particular commit is:
- extract each file from that commit;
- put the original data (and file name) into Git's index; and
- extract the Git-formatted ("freeze-dried") file into your work-tree, where you can see and work on it.
When you run git commit
, what Git does is:
- package up the index's content, as of that moment, as the saved snapshot;
- assemble and package up all the appropriate metadata to make the commit object—this includes making the new commit point back to the current commit, by using the current commit's hash ID as the new commit's parent;
- write all of that out as a new commit; and
- stuff the new commit's hash ID into the current branch name.
So, whatever is in the index (aka staging area) at the time you run git commit
is what gets committed. This means that if you've changed stuff in your working tree—whether that's modifying some file, adding a new file, removing a file entirely, or whatever—you need to copy the updated file back into Git's index (or remove the file from Git's index entirely, if the idea is to remove the file). In general, the command you use to do this is git add
. This command takes some file name(s) and uses your work-tree copy of that file, or those files, to replace the index copy of that file, or those files. If the file has gone missing from your work-tree (because you removed it), git add
updates Git's index by removing the file from there, too.
In other words, git add
means make the index copy of this file / these files match the work-tree copy. Only if the file is all-new—does not exist in the index at the time you run git add
—is the file really added to the index.7 For most files, it's really just replace existing copy.
The index copy of a file is sort-of-in-Git: it's stored in the big database of all internal objects. But if the index copy of a file has never been committed before, it's in a precarious state. It's not until you run git commit
, and Git packages up everything that's in the index and turns it into a new commit, that it's safely committed to Git and can't be removed or destroyed.8
5Mercurial uses a very different storage scheme, in which it often stores diffs, but occasionally stores snapshots. This is mostly irrelevant, but Git provides and documents tools that can reach directly into its internal storage format, so it can be important, at times, to know about Git's internal storage format.
6Because it's always de-duplicated, this "copy" of the file takes no space initially. More precisely, it takes no space for its content. It occupies some amount of space within Git's index file, but that's relatively small: just a few dozen or hundred bytes per file, typically. The index contains just the file's name, some mode and other cache information, and an internal Git object hash ID. The actual content is stored in the Git object database, as an internal blob object, which is how Git does the de-duplication.
7Perhaps git add
should have been called git update-index
or git update-staging-area
, but there already is a git update-index
. The update-index command requires knowing how Git stores files as internal blob objects: it's not very user-friendly, and in fact is not aimed at being something you would ever use yourself.
8A committed file exists in Git as a mostly-permanent and completely-read-only entity—but its permanence, the one prefixed with mostly here, is predicated on the commit's permanence. It is possible to drop commits entirely. If you've never sent some particular commit to any other Git, dropping the commit from your own Git repository will make it go away for real (though not right away). The big problem with dropping commits entirely is that if you have sent it to some other Git, that other Git may give it back to yours again later: commits are sort of viral that way. When two Gits have Git-sex with each other, one of them is likely to catch commits.
Summary
So, now we know what commits are: numbered objects with two parts, data (snapshot) and metadata (information) that are strung together, backwards, through their metadata. Now we know what branch names are too: they store the hash ID of a commit that we should call the last in some chain (even if there are more commits after it). We know that nothing inside any commit can ever be changed, but we can always add new commits. To add a new commit, we:
- have Git extract an existing commit, usually by branch name;
- muck with the files that are now in our work-tree;
- use
git add
to update any files we want updated: this copies the updated content from our work-tree back into Git's index; and
- use
git commit
to make a new commit, that updates the branch name.
If we take some series of commits like this:
...--G--H <-- main, br1, br2
and attach HEAD
to br1
and make two new commits we'll get:
I--J <-- br1 (HEAD)
/
...--G--H <-- main, br2
If we now attach HEAD
to br2
and make two new commits, we will get:
I--J <-- br1
/
...--G--H <-- main
\
K--L <-- br2 (HEAD)
Note that in each step, we have merely added a commit to the set of all commits in the repository. The name br1
now identifies the last commit on its chain; the name br2
identifies the last commit on its chain; and the name main
identifies the last commit on that chain. Commits H
and earlier are on all three branches.9
At all times, there is only one current commit. It is identified by HEAD
: HEAD
is attached to one of your branch names. The current commit's files get copied out to your work-tree, through Git's index, and there's only one work-tree and one index, too. If you want to switch to some other branch name, and that other branch name reflects some other commit, you will have to switch around Git's index and your work-tree as well.10
9Other version control systems take other positions. For instance, in Mercurial, a commit is only ever on one branch. This requires different internal structures.
10This isn't completely true, but the details get complicated. See Checkout another branch when there are uncommitted changes on the current branch.
git worktree add
Now that we know how to use our one work-tree, Git's one index, and the one single HEAD
, we can see how it can be painful to switch around from branch to branch: all our work-tree files get updated each time we switch (except for the complicated situation mentioned in footnote 10, anyway).
If you need to work in two different branches, there's a simple solution: make two separate clones. Each clone has its own branches, its own index, and its own work-tree. But this has one big drawback: it means you have two entire repositories. They might use up a lot of extra space.11 And, you might not like having to deal with multiple clones and the extra branch names involved. What if, instead, you could share the underlying clone, but have another work-tree?
To make a second work-tree useful, this new work-tree has to have its own index and its own HEAD
. And that's what git worktree add
does: it makes a new work-tree, somewhere outside of the current work-tree,12 and gives that new work-tree its own index and HEAD
. The added work-tree must be on some branch that is not checked out in the main work-tree, and is not checked out in any other added work-tree.
Because the added work-tree has its own separate things, you can do work in there without interfering with the work you're doing in the main work-tree. Because both work-trees share a single underlying repository, any time you make a new commit in one work-tree, it's immediately visible in the other one. Because making a commit changes the hash ID stored in a branch name, the added work-tree must not use the same branch name as any other work-tree (otherwise the linkage between branch name, current commit hash ID, work-tree content, and index content gets messed up)—but an added work-tree can always use detached HEAD mode (which we haven't described here).
Overall, git worktree add
is a pretty nice way to deal with your situation. Be sure that your Git version is at least 2.15 if you're going to do a lot of work with this. The git worktree
command was new in Git version 2.5, but has a nasty bug that can bite you if you have a detached HEAD or are slow about working in it, and you also do any work in the main work-tree; this bug is not fixed until Git version 2.15.
11If you make a local clone using path names, Git will try to hard-link internal files to save lots of space. This mostly solves this problem, but some people still won't like having two separate repositories, and over time the space usage will go up as well. There are tricks to handle that too, using Git's alternates mechanism. I believe GitHub, for instance, use this to make forks work better for them. But overall, git worktree
fills a perceived gap; perhaps you'll like it.
12Technically, an added work-tree does not have to be outside the main work-tree. But it's a bad idea to put it inside: it just gets confusing. Place it somewhere else. Usually, "right next door" is a good plan: if your main work-tree is in $HOME/projects/proj123/
, you might use $HOME/projects/proj123-alt
or $HOME/projects/proj123-branchX
or whatever.