-1

In the script below, a new project is created. One file is committed. A change is made, but it is removed from the stage. Doing a commit at this point should do nothing. Why is another commit created?

++ git init
Initialized empty Git repository in C:/src/newproject/.git/
++ echo asdf
++ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        file1.txt

nothing added to commit but untracked files present (use "git add" to track)
++ git add file1.txt
warning: LF will be replaced by CRLF in file1.txt.
The file will have its original line endings in your working directory
++ git status
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

        new file:   file1.txt

++ git commit '--message=this is the message'
[master (root-commit) c3f5d0f] this is the message
 1 file changed, 1 insertion(+)
 create mode 100644 file1.txt
++ git log
commit c3f5d0f7da49b4eacc8df2b6e3e1efda4fc33cad (HEAD -> master)
Author: lit <lit@example.com>
Date:   Tue Dec 17 17:04:30 2019 -0600

    this is the message
++ echo another line
++ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   file1.txt

no changes added to commit (use "git add" and/or "git commit -a")
++ git add file1.txt
warning: LF will be replaced by CRLF in file1.txt.
The file will have its original line endings in your working directory
++ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        modified:   file1.txt

++ git rm --cached file1.txt
rm 'file1.txt'
++ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        deleted:    file1.txt

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        file1.txt

++ git commit '--message=this is the second message'
[master a28bb98] this is the second message
 1 file changed, 1 deletion(-)
 delete mode 100644 file1.txt
++ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

        file1.txt

nothing added to commit but untracked files present (use "git add" to track)
++ git log
commit a28bb987b69c69fabe92154b5f6929fd65819bfd (HEAD -> master)
Author: lit <lit@example.com>
Date:   Tue Dec 17 17:04:36 2019 -0600

    this is the second message

commit c3f5d0f7da49b4eacc8df2b6e3e1efda4fc33cad
Author: lit <lit@example.com>
Date:   Tue Dec 17 17:04:30 2019 -0600

    this is the message
lit
  • 14,456
  • 10
  • 65
  • 119
  • 2
    `git rm --cached` deletes the file from the index, so the next commit will delete the file. That's why "git status" says "Changes to be committed: deleted: file1.txt" – Raymond Chen Dec 17 '19 at 23:51
  • @RaymondChen - The file removed from the staging area was never committed. Nothing is different from the previous snapshot. It remains in the working directory. So, you are saying that `git rm --cached` does not actually delete the file from the staging area. Is that right? – lit Dec 18 '19 at 00:01
  • 1
    `git rm --cached` deletes the file from the index (staging) but leaves it in the working directory. If you commit it, then you delete the file from the repo, but it remains in your working directory. – Raymond Chen Dec 18 '19 at 00:06
  • 1
    a28bb98 was the commit which deleted the file – evolutionxbox Dec 18 '19 at 00:15

2 Answers2

1

The change is not removed from the staging area. The entire file is removed from the staging area.

git rm --cached file1.txt
rm 'file1.txt'
++ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        deleted:    file1.txt

Note that this is showing up as to be committed. That means the file is in the HEAD commit (see the last section on git status).

Long

The way to think about this is:

  • Git commits store whole files, always. They do not store changes.
  • Each commit has its own independent set of files, quite apart from every other commit. (However, since the files in a commit are completely read-only, frozen for all time, any commit can share files with any other commit, if the content of those files match. The fact that you can never change any commit, not one single bit, enables this.)
  • The files that go into your next commit are the files that are currently stored in the index.

The index is so important—and/or so poorly named—that Git actually has three names for it. Sometimes Git calls it the index. Sometimes Git calls it the staging area. Occasionally—rarely these days—Git calls it the cache. These different names reflect the different ways that this thing—this index/staging-area/cache—is used, but for the most part, it's all just the one thing.

Despite its importance, though, Git rarely lets you see what is in it—at least, not directly. You can easily see what is in your work tree (or working tree or any number of similar terms—again these all refer to the same thing), because your work-tree—I like to hyphenate it—holds ordinary files in their everyday format, so that every program on your computer can see them and work with them. This is not the case for files that are in commits, nor for files that are in the index.

Normally, when Git shows you a commit, it shows it by comparing the commit to some other commit. The most common comparison is between a child commit and its immediate parent. When you have a pretty-new repo with just two commits in it, one is the parent and the other is the child, and git show shows you what's in the child by:

  • extracting all the files from the parent into a temporary work area;1
  • extracting all the files from the child into a temporary work area; and
  • comparing all the files in these two work areas.

It then merely tells you about files that are different, and by default, shows you what it sees as the difference as well.

The files that are in commits are in a special, read-only, frozen, Git-only format that Git calls a blob object. You don't really need to know this (it won't be on any quiz ) to use Git. But it helps, because you do need to know about the index, to use Git. The files stored in Git's index are in this same read-only, Git-only format.2 This means that you literally can't see them—at least, not without having Git extract them somewhere.

When you git checkout a commit, Git copies that commit's files into the index (but see footnote 2 for technical strictness again). Then it copies—and de-Git-ifies—the frozen-format file into your work-tree, so that you can see it and work with it.

You can now work with the work-tree files. If you change one in any way—whether that's a total replacement, or a modification in place—this has no effect on the index. You probably want the changed file in your new commit, though, so now you should run git add on that file. What git add does is package up the work-tree copy of the file into the internal Git-only format, and write that into the index (and see footnote 2 again for technical accuracy).

When you make a new commit, Git packages up the index's files as a new commit. So now the new commit and the index match. The new commit becomes the current commit. If you updated the index as you went along, all three storage areas match: the current commit, the index, and your work-tree.

If you like, you can remove a file from the index. You can do this while also removing it from your work-tree, or while keeping it in your work-tree. Either way, what you've done is arrange for the next commit you make to just not have the file at all.


1This temporary work area is not your work-tree, which is mostly reserved for you to mess with. In fact, given the way commits are stored internally, Git can usually get away with not bothering to extract very much at all: it's easy for Git to tell that file F in commit P is exactly the same as file F in commit C, for instance, so for all unchanged files, Git can just do nothing at all.

2Technically, the index simply holds the file's name and a reference to the internal blob object that Git is using to store the file's content. But you can use Git without knowing this: it's OK to imagine the index holding the entire file's content, at least until you start getting deep into Git internals and using git ls-files --stage and git update-index directly.


Summary of the above

The short version of all of the above is that the index acts as where you build your next commit. It has a copy of every file—or more precisely, a reference to such a copy—in the form that the file would or does have in a new or an existing commit.

When you run git commit, Git packages up the index into a new commit. The new commit becomes the current commit as soon as possible after the new commit has been created.3 So, now the index and the commit match. That's also the normal case right after git checkout: the index and commit normally match. You make them not-match using git add and/or git rm. Then you make a new commit from the index, and they match again. The index starts out as a copy of the current commit. Then you change it—put entire new files in, or take entire files out—to build up your proposed new commit. Then you commit and they match.4 All of this happens mostly-invisibly, because the only files you can see and work with are the ones in your work-tree.


3This is so fast that it's almost impossible not to see it as a single operation. But it is actually separate operations: "write out commit", then "update some reference". The reference update requires adding to the reference's reflog, in most cases, and that's where you could—at least in theory, if you're fast enough—see these various steps unfold.

4There are some exceptions to this rule. See, e.g., Checkout another branch when there are uncommitted changes on the current branch. Eventually, look into git commit --only too. But it's at least relatively dependable.


Viewing the index with git status

Remember that the index (or staging area, if you prefer that name) sits, in effect, between your current commit—which Git calls HEAD—and your work-tree. That is, you can draw the current commit on the left, the index in the middle, and your work-tree on the right:

  HEAD        index     work-tree
---------   ---------   ---------
README.md   README.md   READNE.md
file.txt    file.txt    file.txt

The HEAD copy is read-only. You can copy from it, to the index and/or the work-tree, but you can't copy to it. The index copy can be replaced wholesale (git add) or removed entirely (git rm). The work-tree copy is a regular file, so you can do anything that your computer can do, without even using Git at all.

You can't see the index copy of the file directly, but git status will do comparisons and tell you what's different. In fact, git status runs two comparisons:

  • First, it compares HEAD vs the index. For every file that is the same, it says nothing at all. For a file that is different, it reports something staged for commit.

  • Then, it compares the index vs your work-tree. For every file that is the same, it says nothing at all. For a file that is different, it reports something not staged for commit.

This tells you, in a very efficient way, what's in your index: i.e., what will be in the next commit. If it's different from what's in the current commit, you see a change staged for commit. If it's different from what's in your work-tree, you see a change not staged for commit.

There's one last wrinkle here. Because your work-tree is yours, to do whatever you want with it, you can put files into it that aren't in the index. Or, you can take a file that's in all three places—HEAD, the index, and your work-tree—and remove it from the index, without removing it elsewhere. You can't remove it from the commit—no commit can ever be changed—so it remains there, but it can also remain in the work-tree, and/or you can change the file in the work-tree.

Any file that is not in the index, but is in your work-tree, is what Git calls an untracked file. This is the actual definition of untracked file: it's just a file that exists in your work-tree but not in the index.

Because you can change the index (put files in, or git rm --cached to take them out), you can change the untracked-ness of any file at any time. Untracked-ness is always relative to what's in the index.

In any case, though, when you do have untracked files, git status normally complains about them. To shut it up—make it not complain that all your build artifacts are untracked, for instance—you can list file names, or glob patterns, in .gitignore files. These entries in .gitignore do not make files untracked. They just tell git status to shut up about them, and tell git add not to add them to the index by default. If a file that would match a .gitignore line is already tracked, though, it stays tracked.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thank you for your extended explanation, @torek. My error was thinking that `--cached` made the command only apply to the index. I will eventually learn. – lit Dec 18 '19 at 21:00
1

You wanted git reset file1.txt, which resets the file1.txt index entry to point at the content from the commit you specify (defaulted to HEAD, your currently checked out commit, here), replacing whatever you put there with git add. What you did was to remove that path entirely. The indexed snapshot is of a tree with no file1.txt, the committed snapshot is of a tree with a file1.txt. They're different, so, literally "of course", git commit is happy to commit the new snapshot.

jthill
  • 55,082
  • 5
  • 77
  • 137
  • The situation I was thinking about was the case where changes had been made to file1.txt and `git add` had put it into the index. Additional changes to file1.txt were needed, but other files in the index needed to be committed. I wanted to undo the `add` without losing the current changes to file1.txt in the working directory. Is there a way to do that? – lit Dec 18 '19 at 21:06
  • Sure: going on exactly what's in your comment, where you've added the file1 content from your work tree but didn't want to do that just yet, that's exactly what the reset does: it resets the index entry to the version you give it, here it's the one from your checkout. If you've got changes added and _further_ changes in the work tree, and only want the added changes, you can do the reset and `git add --patch` later, or `savetree=`git write-tree`; git reset file.txt; git commit; git reset $savetree file` to use a quick sideband cache of the index with your oops-not-yet content. – jthill Dec 18 '19 at 22:11