0

A repository in my GitHub has two branches: master and solution. First I git clone

git clone <master url>

then I cd to that folder and switch to solution branch

git checkout solution

I find the contents of files is still the same as in master, e.g. README.md. how can I access solution files?

then I tried git pull to update the files in solution branch

git pull origin solution

and it works and now the contents of files are for solution, but when I want to switch back to master, it failed and said I need to merge, because I think some files have different contents in the two branches. How to switch back?

In general, how to edit and update files in different branches and how to easily switch back and forth?

Another example:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2     
              \
               M--N
                   \
                    P

Is another worktree needed?

kinder chen
  • 1,371
  • 5
  • 15
  • 25
  • Not sure you can. I'm usually using git stash. It's a different solution but it resolves the same problem - switch between working copies. Here's a great article https://www.atlassian.com/git/tutorials/saving-changes/git-stash – Viktor Born Oct 30 '20 at 03:06
  • Regarding the edit: what *name* finds the commit whose hash ID is `P`? From commit `P` you can work back to commits `N` and then `M` and so on, but how will you find `P` itself? – torek Nov 03 '20 at 01:31
  • Can I work from `L` to `P`? I'm also confused here, so do I need to use `git worktree add` in this case? – kinder chen Nov 03 '20 at 18:45

2 Answers2

3

Those new to Git often think that Git stores changes in branches. This is not true. In your case, though, I think what you are running into is the fact that when you do work in a Git repository, you do so in what Git calls your working tree. Anything you do here is not in Git (yet).

You might want to use git worktree add to deal with your particular situation. We'll get to that after covering how Git handles all of this, because it won't make any sense without a lot of basics.

The way I like to explain this is that Git does not store changes at all, and does not really care about branches. What Git stores, and cares about, are commits. This means that you need to know what a commit is and does for you, how you find a commit, how you use an existing commit, and how you make a new commit.

What commits are

The basic entity that you will use, as you do work using Git, is the commit. There are three things you need to know about a commit. You just have to memorize these as they are arbitrary: there's no particular reason they had to be done like this, it's just that when Linus Torvalds wrote Git, these are the decisions he made.

  1. Each commit is numbered.

    The numbers, however, are not simple counting numbers: we don't have commit #1 followed by commits 2, 3, 4, and so on. Instead, each commit gets a unique, but very big and ugly, number expressed in hexadecimal, that is between 1 and something very large.1 Every commit in every repository gets a unique, random-looking number.

    It looks random, but isn't. It's actually a cryptographic checksum of the internal object content. This peculiar numbering scheme enables two Gits to exchange content by handing each other these large numbers.

    A key side effect of this is that it's physically impossible to change what's in a commit. (This is true of all of Git's internal objects.) The reason is that the hash ID, which is how Git finds the object, is a checksum of the content. Take one of these out, make changes to its content, and put it back, and what you get is a new commit (or new other internal object), with a new and different hash ID. The existing one is still in there, under the existing ID. This means not even Git itself can change the content of a stored commit.

  2. Each commit stores a full snapshot of every file.

    More precisely, each commit stores a full copy of every file that Git knew about at the time you, or whoever, made the commit. We'll get into this "knew about" part in a bit, when we look at how to make a new commit.

    These copies are read-only, compressed, and stored in a format that only Git itself can read. They are also de-duplicated, not just within each commit, but across every commit. That is, if your Git repository had some particular copy of a README file or whatever, stored in some commit, and you ever make a new commit that has the same copy of the file—even under some other name—Git will just re-use the previous copy.

  3. And, each commit stores some metadata.

    The metadata with a commit include the name and email address of the person who made that commit. Git gets this from your user.name and user.email setting, and simply believes that you are whoever you claim to be. They include a date-and-time stamp of when you (or whoever) made the commit.2 The metadata also include why you (or whoever) made the commit, in the form of a commit message. Git isn't particularly strict about what goes into the message, but they should generally look a lot like email, with a short one-line subject, and then a message body.

    One part of this metadata, though, is strictly for Git itself. Each commit stores, in its metadata, the commit number of the previous commit.3 This forms commits into simple backwards-looking chains:

    ... <-F <-G <-H
    

    Here, each of the uppercase letters stands in for some actual commit hash ID. Commit H, the most recent one, has inside it the actual hash ID of earlier commit G. When Git extracts earlier commit G from wherever it is that Git keeps all the commits, commit G has inside it the actual hash ID of earlier-than-G commit F.

    We say that commit H points to commit G, which points to commit F. Commit F in turn points to some still-earlier commit, which points to another earlier commit, and so on. This works its way all the way back to the very first commit ever, which—being the first commit—can't point backwards, so it just doesn't.

This backwards-looking chain of commits in a Git repository is the history in that repository. History is commits; commits are history; and Git works backwards. We start with the most recent, and work backwards as needed.


1For SHA-1, the number is between 1 and 1,461,501,637,330,902,918,203,684,832,716,283,019,655,932,542,975. This is ffffffffffffffffffffffffffffffffffffffff in hexadecimal, or 2160-1. For SHA-256 it's between 1 and 2256-1. (Use any infinite-precision calculator such as bc or dc to compute 2256. It's very big. Zero is reserved as the null hash in both cases.)

2Actually, there are two user-email-time triples, one called "author" and one called "committer". The author is the person who wrote the commit itself, and–back in the early days of Git being used to develop Linux—the committer was the person who received the patch by email and put it in. That's why the commit messages are formatted as if they were email: often, they were email.

3Most commits have exactly one previous commit. At least one commit—the very first commit—has no previous commit; Git calls this a root commit. Some commits point back to two earlier commits, instead of just one: Git calls them merge commits. (Merge commits are allowed to point back to more than two earlier commits: a commit with three or more parents is called an octopus merge. They don't do anything you couldn't do with multiple ordinary merges, but if you're tying together multiple topics, they can do that in a sort of neat way.)


Branch names are how we find commits

Git can always find any commit by its big ugly hash ID. But these hash IDs are big, and ugly. Can you remember all of yours? (I can't remember mine.) Fortunately, we don't need to remember all of them. Notice how, above, we were able to start with H and work backwards from there.

So, if commits are in backwards-pointing chains—and they are—and we need to start from the newest commit in some chain, how do we find the hash ID of the last commit in the chain? We could write it down: jot it down on paper, or a whiteboard, or whatever. Then, whenever we make a new commit, we could erase the old one (or cross it off) and write down the new latest commit. But why would we bother with that? We have a computer: why don't we have it remember the latest commit?

This is exactly what a branch name is and does. It just holds the hash ID of the last commit in the chain:

...--F--G--H   <-- master

The name master holds the actual hash ID of the last commit H. As before, we say that the name master points to this commit.

Suppose we'd like to make a second branch now. Let's make a new name, develop or feature or topic or whatever we like, that also points to commit H:

...--F--G--H   <-- master, solution

Both names identify the same "last commit", so all the commits up through H are on both branches now.

The special feature of a branch name, though, is that we can switch to that branch, using git switch or, in Git predating Git 2.23, git checkout. We say git checkout master and we get commit H and are "on" master. We say git switch solution and we also get commit H, but this time we are "on" solution.

To tell which name we're using to find commit H, Git attaches the special name HEAD to one (and only one) branch name:

...--F--G--H   <-- master, solution (HEAD)

If we now make a new commit—we'll look at how we do that in a moment—Git makes the new commit by writing it out with commit H as its parent, so that the new commit points back to H. We'll call the new commit I, although its actual number will just be some other big random-looking hash ID. We can't predict the hash ID because it depends on the exact second at which we make it (because of the time stamps); we just know that it will be unique.4

Let's draw the new chain of commits, including the sneaky trick that Git uses:

...--F--G--H   <-- master
            \
             I   <-- solution (HEAD)

Having made new commit I, Git wrote the new commit's hash ID into the current branch name, solution. So now the name solution identifies commit I.

If we switch back to the name master, we'll see all the files as they were in commit H, and when we switch back to solution again, we'll see the files as they were in commit I. Or, that is, we might see them that way. But we might not!


4The pigeonhole principle tells us that this will eventually fail. The large size of hash IDs tells us that the chance of failure is minute, and in practice, it never occurs. The birthday problem requires that the hash be very large, and deliberate attacks have moved from a purely theoretical issue with SHA-1 to being something at least theoretically practical, which is why Git is moving to larger and more-secure hashes.


Making new commits

It's time now to look more closely at how we actually make new commit I above. Remember, we mentioned that the data in a commit—the files making up the snapshot—are completely read-only. The commit stores files in a special, compressed, read-only, Git-only format that only Git itself can read. This is quite useless for doing any actual work.

For this reason, Git must extract the files from the commit, into some sort of work area. Git calls this work area your working tree or work-tree. This concept is pretty simple and obvious. Git just takes the "freeze-dried" files from the commit, rehydrates or reconstitutes them, and now you have usable files. These usable, work-tree copies of the files are of course copies. You can do anything you want with them. None of that will ever touch any of the originals in the commit.

As I mentioned at the top of this, these work-tree copies of your files are not in Git. They are in your work area. They are your files, not Git's. You can do anything you want to or with them. Git merely filled them in from some existing commit, back when you told Git to do that. After that, they're all yours.

At some point, though, you would probably like Git to make a new commit, and when it does that, you'd like it to update its files from your files. If Git just re-saved all of its own files unchanged, that would be pretty useless.

In other, non-Git, version control systems, this is usually really easy. You just run, e.g., hg commit in Mercurial, and Mercurial reads your work-tree files back, compresses them into its own internal form,5 and makes the commit. This of course requires a list of known files (and, e.g., hg add updates the list). But Git doesn't do that: that's too easy, and/or maybe too slow.

What Git does instead is to keep, separately from the commits and from your work-tree, its own extra "copy" of each file. This file is in the "freeze-dried" (compressed and de-duplicated) format, but isn't actually frozen like the one in a commit. In effect, this third "copy" of each file sits between the commit and your work-tree.6

This extra copy of each file exists in what Git calls, variously, the index, or the staging area, or—rarely these days—the cache. These three names all describe the same thing. (It's mainly implemented as a file named .git/index, except that this file can contain directives that redirect Git to other files, and you can have Git operate with other index files.)

So, what Git does when you switch to some particular commit is:

  • extract each file from that commit;
  • put the original data (and file name) into Git's index; and
  • extract the Git-formatted ("freeze-dried") file into your work-tree, where you can see and work on it.

When you run git commit, what Git does is:

  • package up the index's content, as of that moment, as the saved snapshot;
  • assemble and package up all the appropriate metadata to make the commit object—this includes making the new commit point back to the current commit, by using the current commit's hash ID as the new commit's parent;
  • write all of that out as a new commit; and
  • stuff the new commit's hash ID into the current branch name.

So, whatever is in the index (aka staging area) at the time you run git commit is what gets committed. This means that if you've changed stuff in your working tree—whether that's modifying some file, adding a new file, removing a file entirely, or whatever—you need to copy the updated file back into Git's index (or remove the file from Git's index entirely, if the idea is to remove the file). In general, the command you use to do this is git add. This command takes some file name(s) and uses your work-tree copy of that file, or those files, to replace the index copy of that file, or those files. If the file has gone missing from your work-tree (because you removed it), git add updates Git's index by removing the file from there, too.

In other words, git add means make the index copy of this file / these files match the work-tree copy. Only if the file is all-new—does not exist in the index at the time you run git add—is the file really added to the index.7 For most files, it's really just replace existing copy.

The index copy of a file is sort-of-in-Git: it's stored in the big database of all internal objects. But if the index copy of a file has never been committed before, it's in a precarious state. It's not until you run git commit, and Git packages up everything that's in the index and turns it into a new commit, that it's safely committed to Git and can't be removed or destroyed.8


5Mercurial uses a very different storage scheme, in which it often stores diffs, but occasionally stores snapshots. This is mostly irrelevant, but Git provides and documents tools that can reach directly into its internal storage format, so it can be important, at times, to know about Git's internal storage format.

6Because it's always de-duplicated, this "copy" of the file takes no space initially. More precisely, it takes no space for its content. It occupies some amount of space within Git's index file, but that's relatively small: just a few dozen or hundred bytes per file, typically. The index contains just the file's name, some mode and other cache information, and an internal Git object hash ID. The actual content is stored in the Git object database, as an internal blob object, which is how Git does the de-duplication.

7Perhaps git add should have been called git update-index or git update-staging-area, but there already is a git update-index. The update-index command requires knowing how Git stores files as internal blob objects: it's not very user-friendly, and in fact is not aimed at being something you would ever use yourself.

8A committed file exists in Git as a mostly-permanent and completely-read-only entity—but its permanence, the one prefixed with mostly here, is predicated on the commit's permanence. It is possible to drop commits entirely. If you've never sent some particular commit to any other Git, dropping the commit from your own Git repository will make it go away for real (though not right away). The big problem with dropping commits entirely is that if you have sent it to some other Git, that other Git may give it back to yours again later: commits are sort of viral that way. When two Gits have Git-sex with each other, one of them is likely to catch commits.


Summary

So, now we know what commits are: numbered objects with two parts, data (snapshot) and metadata (information) that are strung together, backwards, through their metadata. Now we know what branch names are too: they store the hash ID of a commit that we should call the last in some chain (even if there are more commits after it). We know that nothing inside any commit can ever be changed, but we can always add new commits. To add a new commit, we:

  • have Git extract an existing commit, usually by branch name;
  • muck with the files that are now in our work-tree;
  • use git add to update any files we want updated: this copies the updated content from our work-tree back into Git's index; and
  • use git commit to make a new commit, that updates the branch name.

If we take some series of commits like this:

...--G--H   <-- main, br1, br2

and attach HEAD to br1 and make two new commits we'll get:

          I--J   <-- br1 (HEAD)
         /
...--G--H   <-- main, br2

If we now attach HEAD to br2 and make two new commits, we will get:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2 (HEAD)

Note that in each step, we have merely added a commit to the set of all commits in the repository. The name br1 now identifies the last commit on its chain; the name br2 identifies the last commit on its chain; and the name main identifies the last commit on that chain. Commits H and earlier are on all three branches.9

At all times, there is only one current commit. It is identified by HEAD: HEAD is attached to one of your branch names. The current commit's files get copied out to your work-tree, through Git's index, and there's only one work-tree and one index, too. If you want to switch to some other branch name, and that other branch name reflects some other commit, you will have to switch around Git's index and your work-tree as well.10


9Other version control systems take other positions. For instance, in Mercurial, a commit is only ever on one branch. This requires different internal structures.

10This isn't completely true, but the details get complicated. See Checkout another branch when there are uncommitted changes on the current branch.


git worktree add

Now that we know how to use our one work-tree, Git's one index, and the one single HEAD, we can see how it can be painful to switch around from branch to branch: all our work-tree files get updated each time we switch (except for the complicated situation mentioned in footnote 10, anyway).

If you need to work in two different branches, there's a simple solution: make two separate clones. Each clone has its own branches, its own index, and its own work-tree. But this has one big drawback: it means you have two entire repositories. They might use up a lot of extra space.11 And, you might not like having to deal with multiple clones and the extra branch names involved. What if, instead, you could share the underlying clone, but have another work-tree?

To make a second work-tree useful, this new work-tree has to have its own index and its own HEAD. And that's what git worktree add does: it makes a new work-tree, somewhere outside of the current work-tree,12 and gives that new work-tree its own index and HEAD. The added work-tree must be on some branch that is not checked out in the main work-tree, and is not checked out in any other added work-tree.

Because the added work-tree has its own separate things, you can do work in there without interfering with the work you're doing in the main work-tree. Because both work-trees share a single underlying repository, any time you make a new commit in one work-tree, it's immediately visible in the other one. Because making a commit changes the hash ID stored in a branch name, the added work-tree must not use the same branch name as any other work-tree (otherwise the linkage between branch name, current commit hash ID, work-tree content, and index content gets messed up)—but an added work-tree can always use detached HEAD mode (which we haven't described here).

Overall, git worktree add is a pretty nice way to deal with your situation. Be sure that your Git version is at least 2.15 if you're going to do a lot of work with this. The git worktree command was new in Git version 2.5, but has a nasty bug that can bite you if you have a detached HEAD or are slow about working in it, and you also do any work in the main work-tree; this bug is not fixed until Git version 2.15.


11If you make a local clone using path names, Git will try to hard-link internal files to save lots of space. This mostly solves this problem, but some people still won't like having two separate repositories, and over time the space usage will go up as well. There are tricks to handle that too, using Git's alternates mechanism. I believe GitHub, for instance, use this to make forks work better for them. But overall, git worktree fills a perceived gap; perhaps you'll like it.

12Technically, an added work-tree does not have to be outside the main work-tree. But it's a bad idea to put it inside: it just gets confusing. Place it somewhere else. Usually, "right next door" is a good plan: if your main work-tree is in $HOME/projects/proj123/, you might use $HOME/projects/proj123-alt or $HOME/projects/proj123-branchX or whatever.

torek
  • 448,244
  • 59
  • 642
  • 775
  • thx, I tried `git switch` and it works and different branches works individually as the figures you drew in Summary. Do I still need to use `git worktree add`? – kinder chen Oct 30 '20 at 16:22
  • If you're happy with `git switch` / `git checkout` and the shuffling of files in the (single) work-tree, there's no need to add another work-tree. If you're *not* happy with shuffling files about in the only-one-there work-tree, and your Git is at least 2.5 (preferably at least 2.15), add more work-trees to avoid the shuffling-of-files effect. – torek Oct 30 '20 at 23:09
  • I find if the two branches have different files and filenames, when I `git switch`, the files keep showing in different branches. How to handle this? – kinder chen Oct 31 '20 at 05:25
  • It sounds like in this case, you simply have never told Git about the existence of this file. It remains an *untracked file* in that case. It is not in either commit, so Git doesn't have to remove-and-replace it. It is just a file you left lying around in your work-tree. Git will leave it alone. – torek Oct 31 '20 at 05:52
  • I create a file and `git add` and `git commit`, then I `git rm` to remove the file, and then I `git push`, it gave an error. Why it fails? How to fix? – kinder chen Oct 31 '20 at 19:48
  • (1) You almost certainly don't want to run `git rm` here. Re-read the above about how Git's index holds the *next* commit you plan to make. (2) `git push` doesn't push *files*, it pushes *commits*. Your commits must *add on* to the commits that *they* will find in *their* Git repository using the name that you tell them to change to record *your* commits. Review the answer and understand how the *commit graph* works, and how Git finds commits. – torek Nov 01 '20 at 04:38
  • my understanding is `git worktree add` is like to create a folder in a folder, am I right? – kinder chen Nov 02 '20 at 15:06
  • @kinderchen: no. `git worktree add` creates an *additional work-tree* which comes with an additional *index* and `HEAD`. While a work-tree is a directory (aka folder), it's not *just* a directory: it is a whole ecosystem within Git. Calling a work-tree a directory is like calling a tree "a piece of wood": it's not *wrong*, but it completely misses the essence. – torek Nov 02 '20 at 15:09
  • can you give an example that we need additional index and `HEAD`? – kinder chen Nov 02 '20 at 15:25
  • If you'd like to have two different commits checked out, in two different locations, *at the same time*, and intend to use both of them to do additional work *at the same time*, you'll definitely need two work-trees. If you'd just like to look at one while you do work in the other, you can get away without using `git worktree add` because you don't need the second checkout to have a work-tree—but it may be *convenient* to use `git worktree add`. – torek Nov 02 '20 at 15:28
  • thx, but in that case, why not just add another branch? It also works in two different locations at the same time. – kinder chen Nov 02 '20 at 15:40
  • A branch is just a pointer to a commit. You can't *use* a branch until you check it out somewhere: into a *work-tree*. – torek Nov 02 '20 at 15:46
  • I updated the question with a figure of worktree based on yours in Summary. So, like I draw in the figure, if I have branches based on branch `br2`, another worktree is needed, am I right? – kinder chen Nov 02 '20 at 16:13
0

If you want to switch between branches (here Master & Solution), you can do that in two ways. Example, if you have changes in the 'Solution' branch and you want to switch to 'Master' branch.

  1. If you are happy with the changes in the 'Solution' branch you can commit the changes before switching to the 'Master' branch.

  2. If you do not want to commit the changes you can Stash the changes. This will let you store all the changes you make in a file and will return you branch ('Solution') to the state before you made those changes.

The best tool I found for working on branches is SourceTree.

j4jada
  • 334
  • 1
  • 9