55

Are these the same thing? If so, why are there so many terms?!

Also, I know there is this thing called git stash, which is a place where you can temporarily store changes to your working copy without committing them to the repo. I find this tool really useful, but again, the name is very similar to a bunch of other concepts in git -> this is very confusing!!

allyourcode
  • 21,871
  • 18
  • 78
  • 106

3 Answers3

47

The index/stage/cache are the same thing - as for why so many terms, I think that index was the 'original' term, but people found it confusing, so the other terms were introduced. And I agree that it makes things a bit confusing sometimes at first.

The stash facility of git is a way to store 'in-progress' work that you don't want to commit right now in a commit object that gets stored in a particular stash directory/database). The basic stash command will store uncommitted changes made to the working directory (both cached/staged and uncached/unstaged changes) and will then revert the working directory to HEAD.

It's not really related to the index/stage/cache except that it'll store away uncommitted changes that are in the cache.

This lets you quickly save the state of a dirty working directory and index so you can perform different work in a clean environment. Later you can get back the information in the stash object and apply it to your working directory (even if the working directory itself is in a different state).

The official git stash manpage has pretty good detail, while remaining understandable. It also has good examples of scenarios of how stash might be used.

das-g
  • 9,718
  • 4
  • 38
  • 80
Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 4
    git stash is different from the others : it is more like an 'anonymous commit' – Peter Tillemans Aug 18 '10 at 21:29
  • @Peter - I tried to address the stash bit more correctly (I mistakenly read it initially as something about the 'stage'). – Michael Burr Aug 18 '10 at 21:43
  • @Michael Thanks! One point I hope you'll clarify: the end of paragraph 2 says stash stores changes in the "cache", but doing git stash seems to include unstaged changes as well. Is this just another confusion over terminology, or did you really mean that git stash only stores staged changes?? – allyourcode Aug 18 '10 at 21:51
  • 2
    @allyourcode: In one of my interim edits, I tried to make clear that both staged and unstaged changes are *included* in the stash. But it looks like I accidentally dropped some stuff and made the answer less clear. Hopefully that's fixed now. – Michael Burr Aug 18 '10 at 22:39
  • @Michael I think the part where it says changes "that are in the cache" in paragraph 3 could be misleading. If I understand git stash correctly, I think that phrase should be dropped. – allyourcode Aug 19 '10 at 00:14
  • @allyourcode: but all uncommitted changes - both ones that are unstaged or currently staged - are put into the stash. Paragraph 3 is intended to emphasize that - is it confusing the issue instead? Should I just get rid of the whole sentence? (feel free to edit yourself) – Michael Burr Aug 19 '10 at 01:02
  • @Michael your answer says only "changes that are in the cache" get saved to the stash i.e. stashed. What you mean is that _all uncommitted changes_ get stashed, not just the ones in the cache/staging area/index. – allyourcode Aug 20 '10 at 08:48
  • 4
    But if the index can [store multiple stages in it during merges](http://alblue.bandlem.com/2011/10/git-tip-of-week-index-revisited.html), then isn't there a technical difference between index and stage? The index is a file; a stage is an abstract representation of the working file-system and encoded within the index using a unique file-format (and the index can have multiple stages during a merge). But for most cases, the words are interchangable. – Alexander Bird May 22 '12 at 23:05
  • @Thr4wn: that's an interesting article. I'm not sure having multiple stages would change my answer too much (note that the article mentions "a concept of having multiple index numbers (or stage numbers)"), but I'll admit that it's a level of detail on git internals that I wasn't aware of (I was never a git expert, and I'm even less so now, since I haven't been actively using git for a while). The kind of information in that article would make the basis for another good answer here, but I'm certainly not comfortable writing about that level of git internals myself. – Michael Burr May 23 '12 at 03:26
  • 1
    If "index/stage/cache are the same thing", why does `git ls-files --cached` return a much longer list than `git diff --cached --name-only`? The latter shows those files that will be included in the next commit, i.e. staged – Steve Pitchers Jan 06 '16 at 14:22
  • `git ls-files --cached` shows tracked files in addition to files in the index/stage/cache (the `--cache` option is the default behavior of `git ls-files`). I'm not sure of the rationale for that behavior, but that's what it does. `git diff --cached --name-only` only shows files in the index/stage/cache. – Michael Burr Jan 08 '16 at 06:02
  • @MichaelBurr Actually, you've got it backwards. The git index contains ALL tracked files. `git diff --cached` returns diff between `index` and `HEAD`. @StevePitchers, that's why `git ls-files` returns a long list, even if you only have 1 changed file. `git diff --cached` just does a diff between the `index` and `HEAD`, so it's a short list. See https://stackoverflow.com/a/47543410 – wisbucky Nov 29 '17 at 01:09
23

It's very confusing indeed. The 3 terms are used interchangeably. Here's my take on why it's called each of those things. The git index is:

  • a binary file .git/index that is an index of all the tracked files
  • used as a staging area for commits
  • contains cached SHA1 hashes for the files (speeds up performance)

An important note is that the index/cache/stage contains a list of ALL files under source control, even unchanged ones. Unfortunately, phrases like "add a file to the index" or "file is staged to the index" can misleadingly imply that the index only contains changed files.

Here's a demo that shows that the git index contains list of ALL files, not only the changed files:

# setup
git init

echo 'x' > committed.txt
git add committed.txt
git commit -m 'initial'

echo 'y' > staged.txt
git add staged.txt

echo 'z' > working.txt

# list HEAD
git ls-tree --name-only -r HEAD
# committed.txt

# list index
git ls-files
# committed.txt
# staged.txt

# raw content of .git/index
strings .git/index
# DIRC
# committed.txt
# staged.txt
# TREE

# list working dir
ls -1
# committed.txt
# staged.txt
# working.txt

Additional reading:

https://www.kernel.org/pub/software/scm/git/docs/technical/racy-git.txt

What does the git index contain EXACTLY?

wisbucky
  • 33,218
  • 10
  • 150
  • 101
  • Great! I think you mean ls -l not ls -1 – user5389726598465 Jan 29 '18 at 06:17
  • "An important note is that the index/cache/stage contains a list of ALL files under source control, even unchanged ones." This sentence is so important for me. I am reading https://www.atlassian.com/git/tutorials/undoing-changes/git-reset and it says "--mixed is the default operating mode. The ref pointers are updated. The Staging Index is reset to the state of the specified commit." And I said what the heck? Shouldn't staging index be just empty? So I believe if anything is added or committed before, it is in the index. When we add the changed file it updates the SHA of the file, right? – iRestMyCaseYourHonor Mar 26 '20 at 12:05
  • @user5389726598465 -- `ls -1` is valid and matches the output he pasted. (man page: "`-1` Force output to be one entry per line." vs "`-l` List in long format.") – boweeb May 19 '20 at 13:41
  • 1
    @iRestMyCaseYourHonor -- Yes, that is an important note. Remember that git does _not_ work by storing diffs in commits but rather _snapshots_ of content. See the relevant section of [atlassian.com/git/tutorials/saving-changes/git-commit](https://www.atlassian.com/git/tutorials/saving-changes/git-commit), titled "Snapshots, not differences". With that in mind, it makes sense that index should never be empty (unless of course the repo has no committed or staged content, yet). – boweeb May 19 '20 at 13:54
  • A side note about `strings`, its default is to only show sequences of 4-or-more ASCII characters, so if you're trying this yourself, but used a shorter filename, like **foo**, it won't print. `strings -n 3` will, but you'll also get some more garbage. – Zach Young Oct 25 '21 at 00:57
1

The history

https://stackoverflow.com/a/6718135/14972148

confusion: --cache vs --index etc.

From my modified man gitcli: enter image description

add/index/stage/cache

The index file:

  • is an index of all tracked files
  • used as a staging area for commits
  • contains cached SHA1 hashes for the files (speeds up performance)

From man git:
$GIT_INDEX_FILE sets the index file. If not specified, $GIT_DIR/index is used.

The index file is a binary file.
When opened by nvim, here is part of the screenshot:
enter image description here

When opened by nvim with a plugin named fugitive:
enter image description here
The content is very similar to
the output of git status, instead of strings .git/index

related command

  1. git add ("a file is added/staged to the index")
  2. git restore --staged ( --staged | restore the index )
  3. git rm --cached (--cached | only remove files from the index )

stage vs track

  • Untracked changes are not in Git.
  • Unstaged changes are in Git, but not marked for commit.
  • Staged changes are in Git and marked for commit.

git ls-files

            1.
                -c, --cached  ( cache:  obsolete for index) 
                    Show cached files in the output
                    (default)

                -s, --stage
                    Show staged (git add) files'  
                                            mode bits,
                                            object name
                                            stage number  
                -u, --unmerged
                    (forces --stage)
                    Show unmerged files in the output

                -d, --deleted 

                -m, --modified   

                -k, --killed
                    Show files on the filesystem
                    that need to be removed due to
                    file/directory conflicts
                    for checkout-index to succeed. 
            2.
            -o, --others
                other (i.e. untracked) files

                --directory
                    If a whole directory is classified as "other",
                    show just its name  (with a trailing slash)
                    and not its whole contents.

                    --no-empty-directory
                        Do not list empty directories.
                        Has no effect without --directory.

Further Reading

https://github.blog/2021-11-10-make-your-monorepo-feel-small-with-gits-sparse-index/

overview

enter image description here

Good Pen
  • 625
  • 6
  • 10