0

I have two local branches that have mostly the same files. The difference is that the master branch does not track 2 files that the other branch does track.

Before I created the other branch I had two files that were ignored by the master branch. File A and file B.

I created the other branch and checked it out. I changed the gitignore on this other branch to track file A and file B.

Then I kept working on the master branch. After a while I checked out the other branch and pulled/merged the latest commits from master. When I switched to the master branch after doing this, file A and file B were no longer there. They were still in the other branch so I'm guessing that the other branch keeps the files to itself every time you commit on it.

Is this the intended behavior?

  • 1
    Check `master` for a `.gitignore` file which mentions those files. – Ryan Bigg Jan 08 '17 at 21:24
  • Sorry, rephrasing to make it clearer. That's what I did. –  Jan 08 '17 at 21:25
  • I think what could be happening is that since master is not tracking those files, when I commit in the other branch it takes those files from master? –  Jan 08 '17 at 21:26
  • have files A and B existed previously on master? – csminb Jan 08 '17 at 21:59
  • They existed before I even created the other branch, but they were never tracked. –  Jan 08 '17 at 22:00
  • are you merging the branches which contain A and B into master? if so, they will be passed on to master as well. the only scenario (i think) where git would delete the files would be if they were at some point indexed (or brought over from merges with the other branches) and then deleted, see if [git ls-files](https://git-scm.com/docs/git-ls-files) can help you identify if that's the case – csminb Jan 08 '17 at 22:07
  • @csminb I'm merging master which does not have the files into a branch that does have the files. –  Jan 08 '17 at 22:17
  • I see now you're also asking about merging. That's another long topic, though. The shortest answer about merging is that merging is about *combining changes since a common point* so it depends on both the common point, and the changes. – torek Jan 08 '17 at 22:36

1 Answers1

1

For various reasons, people find both the idea of "tracked/untracked files", and branches, quite mysterious. But in fact, they're not.

The first notion to let go of is branches. They don't really mean anything! Well, that is, they mean nothing that people mean. They have some very specific definitions, and in fact, the word "branch" in Git has two different meanings. For more on this, see What exactly do we mean by "branch"? For now, though, think about what Git is doing purely in terms of moving from commit to commit—because this is where the issue comes from.

Commits, and how they form branches

In Git, the commit is almost everything. It's the overriding goal; it's the glue in the repository, and the reason for Git's existence. There's always1 a current commit, called HEAD. But what, precisely, is a commit? The answer is that it consists of two or three parts, depending on how you count:

  • A commit stores a snapshot of a work-tree.

    The work-tree or working-tree (or some variant of this spelling) is where you see your files, and edit them, and otherwise use them. The form in which they're stored inside the repository is no good for this, so Git provides you with a work-tree in which to, well, work.

    The snapshot in a commit lets you access (as in git checkout) any earlier version you have committed. That is, if you made two commits yesterday, and three on Friday, you can view the entire work tree as it was either way yesterday, or all three ways on Friday. To do so, you simply git checkout the commit, naming it via its big ugly SHA-1 hash ID, c0ffeeface or whatever. (You'll see these IDs whenever you run git log.)

  • In addition, a commit stores some metadata. In particular, each commit carries the name and email address of the person who made the commit, and a time-stamp. (In fact, there are two of these name / email / time-stamp triples, one for the "author" and one for the "committer", because of Git's history of emailed patches: this allows someone to email a patch and be the author, while someone else actually does the committing.)

  • In with this same metadata—though you might want to think of it separately—Git keeps a parent ID. The parent of each commit is the commit that was in place just before you made the new commit. Git is then able to use these parent links to navigate through the history of commits—only, it's backwards, working exclusively from "more recent" to "older". (The reason it is—and must be—backwards is that every internal Git object is read-only: once it goes in, it never, ever changes. It would make more sense to people for commits to remember their children, rather than having them remember their parents; but to do so while being read-only, the children would have to be born first, or at the same time as the parents. So Git has the children record their parents instead of the other way around, since the children are inevitably born later.)

    By using these parent links, Git can not only work backwards in history, it can also show you what changed. If the parent commit has a work-tree with a README file that says that apples are purple, and the child commit has a work-tree with a README file that says apples are green, Git can compare these two commits and say: "going from parent to child, you changed apples from purple to green."

This, in fact, is where branches—both the notion itself, and the names like master—come from. Sometimes, you want to "make a branch" so that changes will relate to an older or at least different parent:

A--B--C--E--G   <-- master
       \
        D--F    <-- branch

The name master here refers to commit G, the 7th commit ever made. Commit G's parent is not F, though, but rather E; and E's parent is C, whose parent is B, whose parent is A (and then we hit a so-called root commit that has no parent: obviously the first commit ever made has to be one of these). Meanwhile, the name branch refers to commit F, whose parent is D, whose parent is C. So commit C actually has two children, D and E.

The key here is that the names, master and branch, don't really mean anything to Git. They're just ways to get to the big ugly SHA-1 hashes. Git remembers that master means beadc0de and branch means feedbeef, so that if you say "I'd like to work on master now" Git knows to get commit beadc0de. And then, when you make a new commit, Git automatically updates the current branch so that it has the new commit's ID in it, storing the old ID as the parent of the new commit (this is how branches grow).

So (as noted in What exactly do we mean by "branch"?), when humans say the word branch, they can mean the branch name—the word master, for instance—which simply locates the tip commit of the branch. Or, they can mean "some or all of the commits that can be found by starting at the branch tip and working backwards through history", so that master means all the commits back to A except for D and F, and branch means all the commits back to A except for E and G. Note that in this case, commits A-B-C are in fact on both branches.


1There's a problem with "always" in a new, fresh, empty repository: there are no commits, so there's no commit to be the current HEAD commit. Git handles this with some special cases, which we can just ignore here.


The index, and what it means to be "tracked"

The first problem we find with a Git snapshot vs a work-tree is that, for various reasons, we need to put extra files into real work-trees. In particular, if we compile code, or have temporary files or local configurations, or for any number of other good reasons, we need to have files that don't get committed, but live in the work-tree anyway. So all version control systems provide some way to have "non-versioned" files as well. Git's approach here, however, is unusual, perhaps even unique. What Git does is to expose something most version control systems keep hidden.

In Git, you build up the next commit in something variously called the index, the staging area, or sometimes (as in git diff --cached) the cache. These are all words for the same thing. The short version of the index is that it's simply "where you build the next commit".

To make a commit, you start with a work-tree, which holds versioned (tracked) files and other (untracked) files. You edit some file(s) in some way and then run git add. What git add does is simply to copy the file into the index. Then, once you have everything staged the way you like, you run git commit, and at this point Git makes the new commit from the index. But: What happens to the index afterward?

The answer is ridiculously simple: nothing. The index continues to hold the commit you just made!

This is therefore what it means for a file to be tracked: it's in the index.

That's it—that's all there is to it. A file is tracked if and only if it is in the index. If it is tracked, it will be in the next commit. If it is not tracked, it will not be in the next commit.

What about .gitignore?

The name .gitignore is misleading: it's not exactly files to ignore. The drawback to having untracked files is that Git constantly complains about them. (Git: "whine! file foo is untracked! are you sure you want that? whine, whine") Putting a file name, or a matching pattern, into .gitignore mainly just shuts Git up about the untracked-ness. It doesn't actually make the file untracked: the file is untracked if and only if it's not in the index. It does make Git automatically skip the file when you say "add everything", though, and that's usually what we want.

Putting a file into .gitigore has one bad side effect though: it tells Git that Git should feel free to destroy the file as well, if necessary. There's an interesting side twist here as well, because the .gitignore file itself is usually tracked. So now it's time to consider how git checkout works.

How git checkout really works

I mentioned above that Git mostly cares about moving from commit to commit. This is true for git checkout branchname as well: Git translates the branch name into a raw commit hash, so as to get the files that go with that commit. However, when you check out a branch by name—as we usually do—Git saves that name as the current branch as well, so that it knows which branch name should get the next commit. If you check out a commit by its raw ID, you get what Git calls a "detached HEAD".

All that this "detached HEAD" means is that Git has a commit checked out by its raw ID. (This has consequences if you make new commits, so usually you want to get "back on" a branch, by checking out a name instead of a hash ID.) Meanwhile, though, Git still has the problem of moving from one commit to another, whether or not it's going to store the branch name for the next commit.

What Git does here is to use the index again. Again, the index always holds the next commit to make—but when you've just made one, so that the index and work-tree are "clean" and git status says "nothing to commit", the index and work-tree already match the current (HEAD) commit.

Let's say you're currently on master which is beadc0de, and you say git checkout branch which is feedbeef. The index (and work-tree) matches beadc0de, so Git compares beadc0de and feedbeef to see which files are different. It then replaces, in the index and the work-tree, those files. That includes the file .gitignore, if it's different!

Meanwhile—this is where your removed files come in—what if there are files in beadc0de that are not in feedbeef, or vice versa? What Git does here is just as simple as before: it removes files that aren't in the commit we're moving to, and creates files that are in that commit. This involves removing files from the work-tree, or writing new files into the work-tree.

Removing existing files from the work-tree clobbers them. Git normally tries hard not to clobber files, but—uh oh—if they're listed in .gitignore, Git feels free to clobber them!

So, if branch (i.e., feedbeef) has a .gitignore that ignores some files, and master (beadc0de) has those files tracked, Git can safely remove the files. They're stored in beadc0de, so you'll get them back when you switch back, and they're ignored in feedbeef so it's safe to clobber them. (In fact, I think being stored in beadc0de is sufficient here, although the rules get a bit squirrelly with files like .gitignore and .gitattributes that sometimes switch with checkout.)

This index-and-work-tree comparing thing, by the way, is also how (and why, and when, and why not when it won't) Git lets you switch from one branch to another with uncommitted files. Git works very hard to do as little work as possible, so if it can switch from one commit to another without touching a file in the index and work-tree, it does so.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775