Git index messed up

Question

I accidently commited a large .psd file which then stucks my push processes.

Hence I added *.psd to my gitignore and then tried to delete this commit as it was still trying to push a now non-existing .psd file.

At some point as I was doing some git soft reset, I messed up my git index and now half of my project files are reddish labeled "Index deleted".

No matter if I do git git add ., these files aren't indexed anymore, what can I do ?

Can you use `git reflog` to get back to a good state? Then try removing the .psd file from your .gitignore to see if it shows up? Then `git rm --cached` on that file? — zrrbite, Nov 28 '21 at 21:34
git reflog shows me a commit I'd like to revert to but when I try to, it says that it failed as it would overwrite my changes. Considering git rm --cached, the psd file has been deleted but still, I have many file not being indexed, I cannot push them anymore. — Guillaume Ayad, Nov 28 '21 at 21:44
Are you using `git reset --hard` along with `git reflog`? You may have actual modifications in your current state of the branch that you may not want to lose, so commit those and tag them so you can get them back. How far back is this commit you want to reset to? — zrrbite, Nov 28 '21 at 21:58
Does this answer your question? [How to remove/delete a large file from commit history in the Git repository?](https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-the-git-repository) (Also many other questions on this topic, just follow the "related questions" links.) — IMSoP, Nov 28 '21 at 22:06

score 1 · Accepted Answer · answered Nov 28 '21 at 23:54

The linked question (How to remove/delete a large file from commit history in the Git repository?) is appropriate after you fix your index situation. First, though, you need to fix your index situation.

You mention:

At some point as I was doing some git soft reset ...

git reset --soft does not touch the index (nor your working tree), but can be used to change the commit hash ID stored in HEAD. If you've done that, you may need to put the correct commit hash ID back into HEAD, with git reset --soft again and the correct commit hash ID.

That may suffice to fix everything, since git status compares HEAD (which is moveable) against the current index content, and then compares the current index content (which is changeable) against the working tree content (which is also changeable).

What you need to know about `HEAD`, Git's index (or "staging area"), and your working tree

Git is really all about commits. It's not about files, though commits hold files. It's not about branches, though branches help you (and Git) find the commits. In the end, Git is all about the commits. So it's the commits that matter. But that should leave you with several questions, including:

What exactly is a commit anyway?
How do we find commits?
How do we make new commits?
Can we get rid of old commits?
What is this index thing?

I'm not going to cover some of these here properly, to keep this answer shorter (or shorter for me anyway). But let's start with this, about commits: Commits are numbered. No commit, once made, can ever be changed at all. They are mostly-permanent (but see linked question), and totally read-only.

We (mostly) make new commits by manipulating existing commits. You can make a new commit totally from scratch, but that's usually way too painful for anything except the very first commit ever. So, to make a new commit, we have to take an existing commit, and change something in it. That's a contradiction, by definition: a commit can't be changed, but we need to change something to make a new commit. How do we solve this conundrum?

The answer is simple enough. We don't change the commit. We copy the commit out to something we can change, change that, and use that to make the new commit. So we don't work on commits: we work on stuff copied out of a commit.

Virtually all version control systems do this sort of thing; Git is not really different than SVN or Mercurial or whatever here, in that we first extract some commit, then work on it, then use that to make a new commit.

But Git is different here, for no obvious reason at first. With other version control systems, you extract the commit to a working area, where you work on it, and that's all there is. In Git, you extract the commit to a working area—your working tree or work-tree—but also to a proposed next commit. For historical reasons, Git has three names for this proposed next commit, calling it the "index", or the "staging area", or—a term mostly found in flags like git rm --cached these days—the "cache".

You then work on the files in your working tree, like you would in any version control system. But when you're satisfied with a working-tree file, you must run git add on it. You don't have to do this in Mercurial or SVN,¹ because in those systems, the working tree file is the proposed-next-commit version of the file. In Git, you have to do this: the git add command copies the file back into Git's index, making it ready for the next commit.

¹Except, that is, for all-new files. That's because, e.g., Mercurial has things called the "dircache" and "manifest", which play a similar role to Git's index, but Mercurial keeps these hidden so that you don't have to learn about them. Git, by contrast, whips out its index now and then and slaps you in the face with it (Monty Python fish-slapping dance). You aren't allowed to ignore it. The git commit -a shortcut sometimes almost gets you there, but it's not sufficient: you must learn about Git's index.

Branch names find commits, and commits find commits

Commits are, as I said, numbered. These numbers look random (though they aren't actually random) and are huge and ugly hexadecimal strings. These are generally unusable by humans, so we don't (use them, that is). These are hash IDs or object IDs (OIDs); Git uses OIDs everywhere, including internally.

Commits are also two-part units. One part holds a snapshot of every file, stored in a special, read-only, Git-only, compressed and de-duplicated fashion. The de-duplication takes care of the fact that most commits mostly re-use the files from earlier commits: this keeps the commits from taking huge amounts of space. (In fact, if you make a new commit that undoes what some previous commit did, the stored files for the new commit may take no space at all, since they're now all duplicates.) You don't have to worry about how Git does this: this part works great and doesn't whack you over the head the way the index does.

The other part of each commit is its metadata, or information about the commit itself. This contains stuff like the name and email address of the person who made the commit, some date-and-time stamps, and a log message. When you make a new commit, you supply the log message, and your user.name and user.email settings supply the name and email address. That's all pretty straightforward, but there's one part here that isn't: Git adds, to this metadata, a list of parent commit hash IDs. For most commits, there's exactly one parent.

When you make a new commit, you're doing so by working on some existing commit. Git stores, in your new commit, the hash ID of the commit you chose earlier to work on. So your new commit has that commit's hash ID as its parent. Then Git writes the new commit's hash ID into the current branch name.

This deserves a bit of illustration. Suppose we have the following chain of commits:

... <-F <-G <-H   <--main (HEAD)

where H stands in for the most recent commit's hash ID, and H is the commit we've checked out. main is our branch name, and the name main holds H's hash ID, which is how Git found H, when we said git checkout main or git switch main.

Commit H stores, in H's metadata, earlier G's hash ID. We say that H points to G, hence the arrow in the drawing from H, pointing to G. Commit G is thus the parent of commit H. Both G and H have full snapshots of every file (with de-duplication), so Git can compare the two snapshots to see what changed between G and H. And, G being a commit, G has in its metadata the hash ID of its parent commit F. F points back to yet another earlier commit, and so on.

Anyway, we now manipulate files in our working tree and in Git's index, and make a new commit, which gets a new, unique, random-looking hash ID we'll just call I. New commit I points back to existing commit H:

... <-F <-G <-H   <--main (HEAD)
               \
                I

and the very last step of git commit is that Git writes I's hash ID, whatever it is, into the name main:

... <-F <-G <-H
               \
                I   <--main (HEAD)

and so now main points to commit I instead of commit H.

`git reset`, with `--hard`, `--mixed`, and `--soft`

What git reset --soft does is allow you to move the branch name. What git reset does in general is ... absurdly complicated.

Let's draw a more complicated and useful Git graph:

          I--J   <-- br1
         /
...--G--H   <-- main (HEAD)
         \
          K--L   <-- br2

Here, we have a repository with three branch names, main, br1, and br2. The name HEAD is currently attached to the name main, which selects commit H. The names br1 and br2 select commits J and L respectively.

If we run git merge --ff-only br1, we end up with:

          I--J   <-- br1, main (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

If that was a mistake, we can run:

git reset --hard HEAD~2

(the ~2 means count back two first-parent links; I won't go into a lot of detail here, and won't cover what the --ff-only meant either) and we'll be back to this:

          I--J   <-- br1
         /
...--G--H   <-- main (HEAD)
         \
          K--L   <-- br2

It's as if nothing happened. The --hard here affected both Git's index and our working tree.

Here's what actually happened:

First, git reset does the --soft step. We give it a commit hash ID, such as the raw hash ID of commit H, or a relative commit instruction like HEAD~2. Anything that the git rev-parse command will take is usable here. Git finds that commit, such as commit H. It then makes the branch name to which HEAD is attached point to that commit. So now main points to H.
Then, if we let it—if we use --mixed or --hard—git reset resets Git's index. It does this by removing all the files that came from the commit we were on (J) and installing instead all the files that came from the commit we moved to (H').
Then, if we tell it to—if we use --hard—git reset resets our working tree. For all the files it ripped out of Git's index and replaced with files from H, it rips those files out of our working tree and replaces them with files extracted from commit H.

So that's how git reset --hard puts us back to before the git merge --ff-only: it:

moves the branch name (--soft); then
updates Git's hidden index / proposed-next-commit (--mixed); then
updates our working tree (--hard).

Using the --mixed or --soft flags just makes git reset stop earlier, after doing the second step, or the first step.

(Note that git reset has other modes of operation. If this were all it did, it wouldn't be so absurdly complicated.)

Note that if you were to now use git reset to point to commit L, you would have:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2, main (HEAD)

What, if anything, happens to Git's index and your working tree depend on the flags you give to git reset.

(The hash IDs of the various commits you've reset to get stored in the HEAD reflog, so git reflog will show them. This is a way to find which commit you want to go back to, if you accidentally reset away the hash ID you can't now find. Use the reflogs to find hash IDs that you have lost. Note that the hash IDs are really difficult to remember: you might want to run git show hash or git log -1 hash or similar, using cut-and-paste for the hash IDs, before using git reset --soft, to find out which hash ID holds which commit of interest.)

`git status` and other similar comparators

The git status command works in part by running two git diffs.

The first of these two diffs is:

git diff --staged --name-status

which compares whatever commit HEAD names—all the files stored in that commit, that is—to the files in Git's index. Since these files are normally copied out of that commit, any file we didn't update since then will match. Git won't say anything at all about the matching files.

If we did update some file (e.g., with git add, which I haven't covered here), the file might not match. Then git status will say that the index copy of the file is a change to be committed.

If we move HEAD (and the current branch name) around without changing the index content, we'll have the two out of sync, and many files might be changed, or even deleted. For instance, if we move main backwards from J to H, but leave the index alone, all the files that are different between H and J will show up.

The second comparison git status does compares the files in Git's index to those in your working tree. This is a lot like running git diff --name-status with no options. For each file that matches, Git will say nothing at all. Where files are different—where you've modified a working tree file, but not yet run git add on it—Git will list the file as a change not staged for commit.

(There's a big complicated section here that I will omit for space reasons, talking about how files that are in your working tree, but aren't in Git's index, are untracked files. Git would complain about these unless they're listed in .gitignore. The .gitignore entries don't actually make Git ignore the files, so .gitignore is a misnomer. But for space reasons I am omitting all of this here.)

Git index messed up

1 Answers1

What you need to know about HEAD, Git's index (or "staging area"), and your working tree

Branch names find commits, and commits find commits

git reset, with --hard, --mixed, and --soft

git status and other similar comparators

What you need to know about `HEAD`, Git's index (or "staging area"), and your working tree

`git reset`, with `--hard`, `--mixed`, and `--soft`

`git status` and other similar comparators