3

I have encountered a strange behaviour of Git: I have a repository that contains a number of untracked files and folders specified in the .gitignore file.

The exact steps that I made:

  1. Stashed 4 files: git stash
  2. Checked out my very first commit from months ago: git checkout <hash of first commit>
  3. Looked around without changing anything
  4. Went back to my working branch doing git checkout <my working branch>
  5. Applied the stash: git stash apply

Then I noticed that some (not all) of my untracked files and folders have gone away. How can that be?

Additional info:

  • The stashed files have nothing to do with the disappeared files, I noted the stash actions just for completeness

  • I did not perform one of the commands git stash --include-untracked or git stash save -u, as @Ashish Mathew guessed

  • It seems that only files and foldes have disappeared that were not yet in the .gitignore at the first commit, but have been later added to it

Benni
  • 1,023
  • 11
  • 15

1 Answers1

11

The stashed files have nothing to do with the disappeared files ...

Indeed.

It seems that only files and foldes have disappeared that were not yet in the .gitignore at the first commit, but have been later added to it

This, plus one more thing, is (almost certainly) the source of the problem. Fortunately, you should be able to get those files back—or at least some version of those files. Unfortunately, you'll have to spell them all out and fuss with Git a bunch, and you may get the wrong version. See the example session at the bottom.

First, note that only untracked files are ignored

A file that is not untracked (that is tracked) is never ignored, even if a .gitignore file says to ignore it. Only untracked files are ignored: files are either tracked, untracked-but-not-ignored, or untracked-and-ignored.

But wait: what, precisely, is an untracked file?

An untracked file is a file that is not in the index

This definition is one of the few in Git that is simple and clear. Or, rather, it would be if it were clear what the index is. Unfortunately, the index is very hard to see.

The best one line description I have for the index is this: *The index is where you build your next commit to make.*

This index, also called the staging area and the cache, keeps track of—i.e., indexes—your work-tree. Your work-tree is where you do your work: it has your files in their normal, non-Git format. Files stored permanently and read-only in commits, inside the Git repository, have a special, compressed, Git-only format. The index "sits in between" these two places: it has all your commit-able files, from your work-tree, all set to be committed. But the files in the index are changeable (unlike those inside commits) even though they're already converted to the special Git format.

This means that it's very rare for your index to actually be empty. Most of the time, it just matches your current commit. That's because you just checked out that commit, which put those files into both your index (in Git-only form, ready for the next commit) and your work-tree (in regular ordinary file form, ready for use or editing).

If you modify a file F and run git add F, the git add replaces the copy of the file that was (in Git format) in the index before. The index wasn't empty—it had F in it, along with everything else—it just matched the current commit, so most Git commands don't mention F until you've changed F in the work-tree.

So, let's consider:

Checked out my very first commit from months ago: git checkout <hash of first commit>

This tells Git: fill the index and work-tree from that very first commit. Let's suppose we have not actually run this command yet, and just consider: what will this do? What's in that commit?

Well, that commit has whatever was in the index when you made it—whatever you had used git add to copy into the index. That includes, say, file abc.txt, which you decided later had to be untracked.

To be untracked, you had to remove abc.txt from the index at some point, probably with:

git rm --cached abc.txt

(which leaves the work-tree copy in place, while removing the index copy). After the git rm --cached, you did a git commit. From the time you ran git rm --cached, until now, the file was not in the index. It was in the work-tree. So it was untracked.

Checking out any commit fills in the index from that commit

Now that you have told Git to check out your very first commit, though ... well, that very first commit has abc.txt in it. Git needs to copy the committed version of abc.txt into the index and into the work-tree.

At this point, if there already is an abc.txt in the work-tree, Git will check whether you are going to clobber it with a different abc.txt. Mostly, Git will refuse to do so, telling you to move it out of the way first. But if the abc.txt in the work-tree matches the one in the commit, well, then it's safe to fill in the index with the abc.txt from the commit. It matches the one in the work-tree, after all.

So at this point, Git extracts all the files from that commit, into the index and into the work-tree. (There are some complicated, but attempted-to-be-safe, exceptions to this general idea: see Checkout another branch when there are uncommitted changes on the current branch.) And, whoa hey, now abc.txt is in the index. Now it's tracked!

So now you look around and at your old commit, and decide to:

git checkout <my working branch>

and now Git has to switch the index and work-tree contents from the first commit, which has abc.txt in it, to the tip commit of <my working branch>. That commit doesn't have abc.txt in it. Git will remove the file from the index ... and remove it from the work-tree too, because it's tracked.

Once the checkout finishes, now the file isn't in the index. Well, it also isn't in the work-tree (argh). If you put it back into the work-tree, now it's untracked. But where can you get it?

The answer is staring us in the face: it's in that first commit. When you ran git checkout <hash>, Git copied the file into both the index and the work-tree (except that it didn't have to touch the work-tree version after all). When you ran git checkout <my working branch> to get back, Git removed the file, but commits are read-only and (mostly) permanent, so the file is still there, in Git-only form, in commit <hash>.

The trick is to get it out of commit <hash> without putting it back into the index, so that it sticks around in normal, non-Git format. The easy way to do this these days is to use git show hash:path > path, e.g.:

git show hash:abc.txt > abc.txt

(note that git show by default does not apply end of line translations and smudge filters—in modern Git you should be able to make it do so using --textconv).

You will have to do this for every file that Git removed, which can be rather painful.


Example session: .gitgnore makes Git OK with clobbering data

I made a tiny repository for test purposes. In this repository, I made an initial commit with a README and file abc.txt containing one line reading original:

$ mkdir tt
$ cd tt
$ git init
Initialized empty Git repository in ...
$ echo original > abc.txt
$ echo for testing overwrite > README
$ git add README abc.txt
$ git commit -m initial
[master (root-commit) a721a23] initial
 2 files changed, 2 insertions(+)
 create mode 100644 README
 create mode 100644 abc.txt
$ git tag initial
$ git rm abc.txt
rm 'abc.txt'
$ git commit -m 'remove abc'
[master 20ba026] remove abc
 1 file changed, 1 deletion(-)
 delete mode 100644 abc.txt
$ touch unrelated.txt
$ echo abc.txt > .gitignore
$ git add .gitignore unrelated.txt 
$ git commit -m 'add unrelated file and ignore rule'
[master 067ea61] add unrelated file and ignore rule
 2 files changed, 1 insertion(+)
 create mode 100644 .gitignore
 create mode 100644 unrelated.txt

We now have a repository with three commits:

$ git log --oneline --decorate
067ea61 add unrelated file and ignore rule
20ba026 remove abc
a721a23 (tag: initial) initial

Let's put some precious data in (ignored) abc.txt:

$ echo precious > abc.txt
$ git status
On branch master
nothing to commit, working tree clean
$ cat abc.txt   
precious

Now let's check out commit initial:

$ git checkout initial
Note: checking out 'initial'.

You are in 'detached HEAD' state. [mass snip]

HEAD is now at a721a23... initial
$ cat abc.txt
original

Oops, our precious data has been clobbered!

It's the .gitignore directive that gives Git permission to clobber the file. To prove this, let's make abc.txt not-ignored (but also not tracked):

$ cp /dev/null .gitignore
$ git add .gitignore
$ git commit -m 'do not ignore precious abc.txt'
[master 564c4fd] do not ignore precious abc.txt
 Date: Thu Feb 8 14:16:08 2018 -0800
 1 file changed, 1 deletion(-)
$ git log --oneline --decorate
564c4fd (HEAD -> master) do not ignore precious abc.txt
067ea61 add unrelated file and ignore rule
20ba026 remove abc
a721a23 (tag: initial) initial
$ echo precious > abc.txt
$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    abc.txt

nothing added to commit but untracked files present (use "git add" to track)

Now if we ask to switch to initial:

$ git checkout initial
error: The following untracked working tree files would be overwritten by checkout:
    abc.txt
Please move or remove them before you switch branches.
Aborting

So there's an annoying side effect to ignoring files: they become clobber-able. I (along, I think, with others in the past) have looked into teaching Git the difference between "ignored and can clobber" and "ignored but precious, do not clobber" and have not been able to fix it simply and have abandoned the effort.

(I thought at one point Git got better-behaved about this, but this example shows that it is still bad in at least Git 2.14.1, which is the version I used in this particular set of tests.)

torek
  • 448,244
  • 59
  • 642
  • 775
  • The same thing happens with svn if a file ends up in the repository at any point. A little more obvious what’s going on, but no real workaround either. I have a home for important files that can’t be tracked for whatever reason—outside the repositories. – zzxyz Feb 06 '18 at 19:38
  • Great answer, many thanks for sharing your knowledge. I have to go through this step by step. – Benni Feb 07 '18 at 07:05
  • @torek: Following your explanations, I was able to reproduce the observed behaviour with a test setup. I was also able to restore deleted files, as you suggested, with `git show hash:abc.txt > abc.txt --textconv`. But you can only restore the state the file had at that exact commit. Changes you made later, after adding the file to .gitignore and `git rm --cached` are lost forever, as soon as you check out a commit before that step. I wonder if there is a way how one can add files to .gitignore at a later point and still be able to check out older commits safely. – Benni Feb 07 '18 at 22:37
  • In general, `git checkout` ought to refuse to overwrite a version of `abc.txt` that *doesn't* match the one coming out of the commit. I seem to recall that at some point (Git 1.6?), listing the file in `.gitignore` made Git willing to overwrite it, but I tested this with Git 2.x and it said "would clobber" (aborting the checkout operation). What version of Git do you have? – torek Feb 07 '18 at 22:49
  • @torek: I can confirm this behaviour for both 1.8.3.1 on Linux and 2.6.1.windows.1. Both with default config settings. – Benni Feb 08 '18 at 21:58
  • Hm, that's bad behavior, but useful info. I've just tested it again in 2.14.1 and it is indeed the case that having the file listed in `.gitignore` makes Git feel free to clobber the file. After altering the `.gitignore` to *not* ignore it, `git checkout initial` (where `initial` is a tag pointing to a commit that has the old `abc.txt`) says: `error: The following untracked working tree files would be overwritten ...` – torek Feb 08 '18 at 22:17