0

I am a beginner to git:

I created new branch and when I did git checkout master, the files still exist in the IDE (I used Webstorm). When I am in the new branch, I created a new file and did commit it before checkout. I don't think this is suppose to happen. What is supposed to happen is that the changes I did in the new branch should not show up in master or the files should disappear in Webstorm after I checkout master

Why does this happen?

Commands I did:

git checkout articles
git add trial.html
git commit -m 'new articles' 
git checkout master
mnestorov
  • 4,116
  • 2
  • 14
  • 24
  • @chepner No, if a file is tracked in one branch and not in another it should be deleted when checking out that "another" branch. – Lasse V. Karlsen Jan 28 '21 at 08:15
  • Can you show the output of `git status` after switching to master? And by "commands I did", these where the **exact** commands you did? Or did you, for instance, add the file, then change it, did not add the latest changes, did a commit? – Lasse V. Karlsen Jan 28 '21 at 08:16
  • @LasseV.Karlsen Huh; is that a newer behavior or configurable? I could have sworn I remembered such files remaining in the working directory at some point. – chepner Jan 28 '21 at 12:30
  • If you *ignore* a file using .gitignore on one branch, and don't ignore it on another, it will look as if changes to files appear in some branches but not all, but if you delete a file on branch A, then check out branch B and the file reappear, if you then checkout branch A again, the file will disappear. Whether it was there before, or never was doesn't matter, it's whether it is currently being tracked on the branch that controls this. – Lasse V. Karlsen Jan 28 '21 at 13:39

2 Answers2

0

In the case of "articles" is your branch you want to commit to and "master" is your branch you want to see the coming commit from the "articles" branch:

git checkout articles
git add trial.html
git commit -m 'new articles'
git push
git checkout master
git merge origin/articles
nzcnef
  • 1
  • 1
0

Commands I [ran]:

git checkout articles
git add trial.html
git commit -m 'new articles'

None of these commands created a new file, so there's something missing from your question.

Note that Git itself is not an IDE. IDEs vary in quality and abilities, and some IDEs may not notice changes that have taken place in your working tree.

To understand how Git works here, remember that Git is all about commits. The repository itself primarily consists of commits. While commits contain files, they are not files themselves. The repository does not hold files directly. What this means is that any files you see in your working tree (or in your IDE) are not in Git. At most, they were copied out of Git—i.e., out of a commit—earlier.

You can also use these files as part of the process of making new commits. In other version control systems (VCSes), this is quite straightforward, as these VCSes make the new commits from these files. This is not the case for Git, however. To really understand this, it's necessary to back up a bit and examine commits more closely.

Commits are numbered

Every commit that can or does exist in a Git repository has a unique number. These numbers are not simple counting numbers: Git does not start with commit #1, and then go on to commits 2, 3, and so on. Instead, each commit gets a unique but random-looking hash ID. This hash ID is used for this commit only, in every Git repository in the universe.1 This means that if some other Git repository comes along and hands your Git a hash ID, your Git can check to see if it already has this, just by looking at the number. If your Git doesn't have this, your Git can get it from the other Git. If your Git does have the number already, it already has the entire commit, and needs nothing from the other Git.

The way Git achieves this numbering is to use a cryptographic hash function (currently SHA-1, with SHA-256 coming). Git runs this function over the bytes that make up the internal commit object. This means that nothing inside a commit can ever change.2 All commits are frozen for all time.


1The pigeonhole principle tells us that this numbering system must fail eventually. The numbers are large enough to, we hope, make the first failure happen after the universe ends. This requires very large hash IDs. This particular constraint is further relaxed by allowing two Git repositories that never meet to re-use hash IDs. I call these doppelgängers, complete with the traditional connotation of bad luck: it's best to avoid them, in case the two Git repositories accidentally do meet.

2In fact, nothing in any internal Git object can ever change. Git checks that the hash ID, which Git uses as the key in a simple key-value database, matches the hash of the data once the data have been retrieved using that key. If the stored data do not hash correctly, Git assumes that the underlying data have been damaged, e.g., by disk error. Magnetic media drives generally have an expected undetected error rate of about 1 in 1017 or worse, but Git will detect virtually all of these errors.


Commits have two parts

Each commit stores two things:

  • A commit holds a full snapshot of every file that Git knew about, in the form it had at the time you (or whoever) made the commit.
  • Along with the snapshot, a commit holds some metadata. The metadata gives information about the commit itself, such as who made it (name and email address) and when. The metadata include a log message, which you get to supply; but they also include data that Git supplies on its own.

A crucial part of the metadata is that each commit contains the raw commit hash ID of some set of earlier commits. Most commits contain exactly one hash ID: these commits are ordinary commits. Commits with no earlier-commit hash ID are called root commits, and I suspect most repositories have just one such commit (though it's possible to create more). Commits with two or more earlier-commit hash IDs are called merge commits. We'll ignore all but ordinary commits here.

The trick that Git performs with this earlier-commit-hash-ID is to form commits into simple backwards-looking chains. Let's draw such a chain right now. Instead of the actual hash IDs, let's use single uppercase letters to stand in for the hash IDs. Let's further assume that the last commit in this chain has hash ID H, and draw the chain like this, with the last commit on the right:

... <-F <-G <-H

We know that commit H holds a full snapshot of every file, in some read-only frozen-for-all-time form. So does earlier commit G. Commit H holds the hash ID of commit G, so we say that H points to G.

This all means that if we give Git the hash ID of commit H, Git can find the commit in its database, read it out, find all of its files, and find the hash ID of G and read out all of G's files too. Git can then compare G's files to H's files. Whatever is the same is uninteresting and Git will tell us nothing about these files, but for each file that is different, Git will proceed to compare the files, as if playing a game of Spot the Difference, and will then tells us how to change the earlier (commit G) version of the file to make it match the later (commit H) version of the file.

In this way, Git can show us commit H as a diff, even though it holds only a snapshot and some metadata. Git uses the metadata to find the earlier commit G.

Of course, having found commit G, Git can now use G's metadata to find another commit that is one step earlier than G. That's commit F. Commit F also has both snapshot and metadata. This allows Git to show us commit G as a diff, even though it is a snapshot, and then move one step back to commit F.

This repeats until Git has gone through every commit, one at a time, backwards, from "child commit" to "parent commit". Git gets to stop doing this when it reaches a commit that lists no parent commit, i.e., a (or the) root commit in the repository.

Branch names

There's one small flaw in the process outlined above. To find a commit, Git needs its hash ID. This means we must somehow remember the hash ID of the last commit H. We don't need to remember the hash ID of earlier commits G, F, and so on, because we can have Git start with H and work backwards; but we do need to remember the hash ID of that last commit H. This is where branch names come in.

What a branch name does, in Git, is hold the hash ID of the last commit in some chain. The name is up to you (within some limits; see the git check-ref-format documentation](https://git-scm.com/docs/git-check-ref-format) for details). The actual hash ID of some commit is determined by the act of making that commit.3 So until Git makes the commit, you won't know its hash ID; you know only that it will be unique.

Let's draw in a hash name:

...--F--G--H   <-- main

for instance. Now the name main holds the hash ID of the last commit in the chain. You just have to remember the name main now.

Let's make a second branch name that also points to commit H, just like main. We'll call this feature:

...--F--G--H   <-- feature, main

We need some way to know which name we're actually using. Let's attach a special name, HEAD (written in all uppercase like this), to one of our branch names:

...--F--G--H   <-- feature (HEAD), main

Now let's make a new commit. Normally I would call this commit I—the next letter after H—but I'm going to reserve a few letters for a moment and call the new commit K. Without worrying yet about how we make a new commit, let's just assume that we've had Git do it. Commit K's parent will be commit H, so that K points back to H. To implement the last step of making a new commit, Git will write this new hash ID into the current branch name, as indicated by the name HEAD. The result looks like this:

...--F--G--H   <-- main
            \
             K   <-- feature (HEAD)

If we make another new commit, we just pick another letter and add on to the chain:

...--F--G--H   <-- main
            \
             K--L   <-- feature (HEAD)

The name feature now selects commit L, which leads back to commit K, which leads back to commit H.

If we now git checkout main to get back to main, what we'll see in this kind of diagram is that HEAD moves:

...--F--G--H   <-- main (HEAD)
            \
             K--L   <-- feature

This means we are using commit H, because main is the current name and the commit main points to is H. If we now make two new commits I and J, this time it is the name main that will advance to cover the new commits:

             I--J   <-- main (HEAD)
            /
...--F--G--H
            \
             K--L   <-- feature

This is how we form branches, in Git. Note that commits up through and including H are on both branches.


3Part of the metadata is a date-and-time-stamp. So, even if you know exactly what snapshot you will put in a commit, and what other metadata you will put in a commit, you can't compute the hash ID you'll get until you actually make the commit, as Git will add the current time to this. So there's little point in trying to predict what the hash ID will be. There are some tricks one can use here to force a particular time-stamp, but remember that part of the commit's data includes the parent commit hash. This means that making a cycle is at least mostly as hard as breaking the hash function.


The mechanics of commits: Git's index and your working tree

We mentioned above that the snapshot in a commit is read-only. It's not just read-only: it stores its saved files in a compressed and de-duplicated format that only Git itself can read. They're not saved as files, but rather as internal objects. This means other programs cannot use these files at all.

Hence, to get any actual work done, we must have Git extract a commit. We do this with git checkout (or, since Git 2.23, git switch). Typically we give git checkout a branch name like main of feature or whatever. This branch name selects some particular commit: the last commit that we want Git to say is "in" or "on" the branch. Git calls this the tip commit of the branch.

That tip commit has some frozen snapshot in it. Git will expand the files in this snapshot, turning them back into ordinary everyday files that you can use. These ordinary files are in what the Git documentation calls your working tree or work-tree.

The files in your working tree are yours. They are ordinary everyday files. They live in an ordinary everyday directory (or folder if you prefer that term). You can do anything you want to and with these files, as they are not Git's files. Git simply copied them out for you, when you asked it to (git checkout / git switch; note that other commands such as git restore and git reset can also tell Git to overwrite your file with some file copied out of Git).

Because this area is yours, you can do anything you like with it, including add all-new-files (that Git doesn't know about), remove existing files, rename files, and so on. Git is blissfully unaware of all of these activities.4 Eventually, though, you must tell Git about some of what you have done. The reason for this is that when you ask Git to make a new commit, Git still doesn't really look at your files.

This is where Git is very different from many other VCSes. In a lot of VCSes, once the VCS knows the name of some file, a later commit operation will read the working tree version of that file and use that to make the new commit. Git does not do this. Instead, Git keeps what amounts to a third copy of each file.

When I talk about a third copy, it's reasonable to ask what the other copies are, so let's review that. Suppose you have files named README.md and LICENSE.md. You also have some current commit. That commit presumably has some version of those two files saved in it. So that's one copy. The other obvious copy is the one Git had to make in your working tree, so that you could read and/or write these files. That's two active copies of each file: one that you literally can't change, saved in the current commit, and one in your work-tree.

The third copy of each of these two files, in Git, sits kind of between these two copies. The frozen committed copy of README.md literally can't be changed, but then, in between this frozen README.md and your regular everyday file README.md, there's one more "copy". This copy is in in the frozen format—pre-de-duplicated and everything—and is ready to go into a new commit. But, not being in a commit, Git can remove this copy and replace it with another file. Technically, Git won't actually remove it—it's being shared with the current commit after all—but it will stop using it here and make either an all-new internal object, if your updated README.md doesn't match any previous version, or it will figure out that your updated README.md does match some previous version, and start sharing with that one.

Hence, this "third copy" is more like a second copy, with your work-tree version being the third copy. Also, because most files in the next commit will probably be the same as most of the files in the current commit, this middle copy of each file probably takes almost no space, because it's just sharing with a committed copy.

These extra "copies" (pre-de-duplicated) exist in what Git calls, variously, the index, or the staging area, or the cache. These three names all refer to the same thing: a place that stores your proposed next commit. Initially, when you git checkout some commit, Git fills in this area—its index, or staging area if you prefer that term—from the commit. Git fills in your area, your work-tree, from the commit as well. So all three copies of every file match. The proposed next commit matches the current commit, and matches your work-tree.

As you make changes to your working tree, the files that are in Git's index / staging-area continue to match your current commit, so that means they fall out of sync with your working tree copies. To update your proposed next commit, you must run git add.

What git add does is make the proposed-commit copy of some file match your work-tree copy. If you've updated an existing file, this replaces the index copy (which used to match the committed copy) with the work-tree copy. If you've created an all-new file, this copies the new file into Git's index, ready to be committed: there was no previous file so this is a new file. And, if you remove a file from your work-tree, what git add does is remove the same file from Git's index, so that your proposed new commit now omits this file.

When you run git commit, the new snapshot Git will make comes from Git's index. Any updates you made in your working tree don't matter unless you first instruct Git to copy those updates into Git's index—your proposed next commit.5

The git status command works by, in part, running two separate comparisons. One compares the current commit's files to the files in the proposed next commit. For all files that match, Git says nothing. For files that don't match, Git calls these files staged for commit. The second comparison compares the files in the index—the proposed next commit—to those in your work-tree. Where these match, Git says nothing, but where they don't, Git calls these files not staged for commit.


4Modern Git includes optional file monitoring software to make things described here go faster, but this kind of monitoring does not—or at least, is not supposed to—make any visible difference other than making things faster.

5Even operations like git commit -a, git commit --include, and git commit --only work with an index. The details get complicated because they create one or more temporary index files to get this all done, and some of this shows through. For instance, if you have commit hooks, the fact that git commit --only makes three index files, and only one of those three has the actual proposed next commit in it—the other two have the rollback version and the proposed-commit-after-next-commit in them, to be used on commit failure or success respectively—makes this tricky.


Untracked files

Once you understand all of the above—how Git uses its index to hold the proposed next commit, and how Git can produce the git status results by comparing against the current commit and against your working tree—you're finally ready to deal with untracked files.

Your working tree is yours, and you can create files in it that Git knows nothing about. A diff comparing your proposed next commit in Git's index to your current work-tree may find "newly added" files in the working tree. It would make sense to report these as added files that are simply unstaged. But Git doesn't do that. Instead, Git reports these as untracked files.

That, in fact, is all that an untracked file is: it's a file that is in your working tree, but is not in Git's index right now. Note that you control which files are in your working tree, by creating or deleting them. Note further that you can control which files are in Git's index: you can use git add to put one in, when it wasn't there before, and you can use git rm to take one out of Git's index when it is there now.

But git checkout or git switch fills in Git's index from the commit you are switching to, to update your proposed next commit. So when you change commits, Git's index contents can change here, and this can add or remove a file to/from Git's index. Again, this depends on both the commit you're switching from, and the commit you're switching to. (Technically, it also depends on the current contents of Git's index. If this matches the current commit, though—which it normally must after a successful git commit, for instance—then we get to disregard this technicality.)

Note, too, that you can change branches without changing commits. Earlier we had this situation:

...--G--H   <-- feature (HEAD), main

If we switch to branch main, we continue using commit H. Nothing else has to change, so nothing else does change: Git will leave both its index and your working tree alone. If we add a new branch name and switch to it, again nothing else has to change, and nothing else does change, so:

git checkout -b newbranch

results in:

...--G--H   <-- feature, main, newbranch (HEAD)

It's the act of changing the current commit that forces Git to update Git's index. Of course it's the act of changing the current branch name that usually results in changing the current commit, too, but in these particular special cases, we can change names without changing commits. (For much–perhaps too much—more on this particular phenomenon, see Checkout another branch when there are uncommitted changes on the current branch<>.)

Ignored files

This answer would be incomplete without mentioning what Git calls .gitignore and "ignored" files. Ignore is the wrong verb here, because these files aren't really ignored after all. Instead, what happens with these files is that if they are currently untracked, git status will shut up about them.

Suppose we have software that, when built by whatever build process we use, produces many build artifacts (.o files, dependency files, PDFs, whatever). Suppose further that we choose not to include these build artifacts in commits (this is usually best practice, though best practice is off topic on StackOverflow). In this case, running git status after running a build would produce many pages of complaints about various untracked files. These are our build artifacts and we know they're untracked and just want Git to stop complaining about them.

To make Git shut up, we list these file names and/or patterns in one or more files named .gitignore or similar. When we run git status, it will do its usual comparisons—finding staged files (differences between current commit and proposed next commit) and unstaged files (differences between proposed next commit and working tree)—and this includes finding all untracked files. But then it will filter away all the expected untracked files, and not complain about them.

To make this even more useful, git add . or git add * will deliberately not copy these untracked files into Git's index. That way, they stay untracked. If git add did copy them into Git's index, they would become tracked, and the listing in .gitignore would be irrelevant.

So, the files named .gitignore might be more accurately named .git-do-not-complain-about-these-files-if-they-are-untracked-and-do-not-add-them-with-en-masse-style-add-commands-either-as-long-as-they-are-untracked-because-they-should-stay-untracked-in-this-case, or something like that. But you probably don't want to have to type in this kind of file name: .gitignore, as inaccurate as it might be in detail, is a whole lot easier to type in.

Scenario

Suppose we have a repository where the last commit on branch main has three files in it: README.md, LICENSE.md, and main.py. We clone this repository (so that we have it locally and can use it) and check out this same last-commit-on-their-main as our own last-commit-on-our-own-main.

If we now do:

echo new > new.txt

this creates a whole new file in our working tree. This file does not exist in the current commit, nor in Git's index. It is therefore untracked.

Now we run:

git checkout -b branch

which creates a new branch, pointing to the same commit we're already on. Nothing needs to change in Git's index or our work-tree, so nothing does change. We still have one untracked file.

We now run:

git add new.txt

This copies the new file into Git's index. We now have new.txt as a tracked file. Its status is "staged for commit" because it does not exist in the current commit. Then we run:

git commit -m 'add new.txt'

(which is a terrible commit message, but at least it's short). This creates one new commit, which becomes the tip commit of branch branch; this new commit contains four files: the original three, exactly matching the previous commit, plus this fourth file.

If we then run:

git checkout main

to switch back to the earlier commit, Git will:

  • notice that we have a file new.txt that is in Git's index and our current commit;
  • notice that the commit we're switching to lacks this file; and thus
  • remove, from Git's index and our working tree, the file new.txt.

Examining the working tree files—which, remember, are not in Git, they were just copied out of Git at some point—we find our familiar three files named README.md, LICENSE.md, and main.py. These files in our working tree were all created by our initial git checkout and have not been touched at all since then (nothing required changing them).

Running git checkout branch will create new.txt again, because it's in the commit that is the tip of branch branch. That file will also be loaded into Git's index.

torek
  • 448,244
  • 59
  • 642
  • 775