21

Git is showing me an entire file is changed, when I can't seem to figure out the changes. This is cygwin git, but it also happens in msysgit

$ git --version
git version 2.1.1

$ diff <(git show HEAD:File.cs) <(cat File.cs)
// Shows no differences

$ diff <(git show HEAD:File.cs | xxd) <(xxd File.cs)
// Shows no differences

$ git diff
// shows the entire file has changed

$ git hash-object <(git show HEAD:File.cs)
7b3762473342a5b040835bfef9f6b45c109ba48b

$ git hash-object <(cat File.cs)
7b3762473342a5b040835bfef9f6b45c109ba48b

$ git hash-object File.cs
7b3762473342a5b040835bfef9f6b45c109ba48b

I have

$ git config --get core.fileMode
false

and

$ git config --get core.autocrlf
true

I truly have no idea what's going on, everything wants them to be the same, yet git wants to create a commit saying the entire contents was deleted and recreated. Does anyone who knows git plumbing better have a suggestion?. All I can think of is git show is removing/normalizing odd line endings.

UPDATE:

I'm pretty sure its happening because development process is like this. Checkout from git, rsync to dev machine, develop, rsync back. I believe rsync is messing with the line endings some. It's just weird that gits not reporting about the line endings, and it seems to get really confused about what the hell is happening. Even though diffing the binary representation of the files seem to be identical.

UPDATE 2:

So this is super annoying, and I feel like I have stumbled upon a bug in git.

For instance

$ git gc
$ git checkout -- .
$ git clean -fd
$ git status

> shows a heap of modified files

I'm pretty sure that should show no changes, no matter where its run, but I get a list of 20 odd things :(

Beau Trepp
  • 2,610
  • 4
  • 22
  • 30
  • Do you see the same with `git config --get core.autocrlf false`? – VonC Oct 24 '14 at 06:11
  • Yep same thing. I even made sure to repull it all out with `git rm --cached . -r` and `git reset --hard`. Then I've run it through the dev tools, go to `git status` and get that the whole file has "changed". I'm willing to accept that my process is changing the files(I imagine line endings), but I want git to actually tell me how. It's confusing me to no end, especially considering the hash-object output is the same. – Beau Trepp Oct 24 '14 at 06:27
  • What OS and what git version are you using? – VonC Oct 24 '14 at 06:29
  • Cygwin, so windows. Specifically 7. Git version is 2.1.1 as in the code block above. An interesting side note, is that `git reset --hard` or `git checkout -- .` won't remove this file(s). So those parts of git think the file hasn't changed and won't overwrite whats on disk, but the diff commit half does!. :( – Beau Trepp Oct 24 '14 at 06:31
  • Could you try with a regular msysgit 1.9.4 in a simple DOS session? – VonC Oct 24 '14 at 06:33
  • Question says it also happens in msysgit :) – Beau Trepp Oct 24 '14 at 06:35
  • Try deleting all your files (except .git dir ofcourse) and force checking out your branch. Then tell weather Git still shows those files as modified? – Mudassir Razvi Oct 24 '14 at 07:45
  • Did you compared the file encoding of both file (like UTF-8 vs Latin1) ? – M'vy Oct 28 '14 at 09:33
  • `file -i somefile` gives application/xml; charset=us-ascii. This occurs the during a rebase conflict, of git checkout --theirs filname or git checkout --ours filename. Git status shows both as completely changed :( – Beau Trepp Oct 29 '14 at 03:22
  • You've been using git show to look at the file in HEAD, what does `git ls-tree HEAD` say about the object in that commit? – ComputerDruid Oct 29 '14 at 17:35
  • I ran into the same problem after rsync a git repository converted from svn. Then I tried transferring tar.gz over HTTP and still the same problem – sdaffa23fdsf Dec 25 '15 at 19:16
  • I used git checkout on every file showing modified. Even after that a few files still showed modified. So I kept doing git checkout (4 times on some files) to finally get a clean repository. Why? – sdaffa23fdsf Dec 25 '15 at 19:25
  • For the test in Update 2, I cannot reproduce it with a small repository https://github.com/Cyan4973/lz4 – sdaffa23fdsf Dec 25 '15 at 19:54

1 Answers1

21

This can be caused by a .gitattributes file indicating to git that it should do EOL normalization but the repository containing non-normalized line endings.

The simple fix is to remove the relevant line from .gitattributes. This could be

* text=auto

or

*.cs text

A quick example of how this could happen goes like this:

$ echo "Hello World" > example.txt
$ unix2dos example.txt #Make sure it uses CRLF
$ git add example.txt
$ git commit -m "commit 1"
$ #Instruct git that all .txt files should be normalized
$ echo '*.txt text' >> .gitattributes 
$ git add .gitattributes
$ git commit -m "commit 2"

Now the repository is in a strange state, because .gitattributes claims the file should be normalized before adding it to the index, but the current committed version is not normalized.

However, at this point, git status doesn't notice that, because the file itself has not changed in size or mtime since it was added to the index, so the index is considered to be up to date:

$ git status
On branch master
nothing to commit, working directory clean

But anything which invalidates the index will cause git to consider the file to be dirty:

$ touch example.txt
On branch master
Changes not staged for commit:

        modified:   example.txt

no changes added to commit (use "git add" and/or "git commit -a")

And git reset --hard or any other action to try to reset the file to the state it's supposed to be in will not fix this. This is because there is no way to add the file to the index in it's current state as it is in the repository, because git has been instructed to normalize that file, and that normalization cannot ever produce the object as it currently is committed.

This is why the GITATTRIBUTES(1) man page recommends to explicitly invalidate the entire index when introducing line-ending normalization like so:

$ echo "* text=auto" >>.gitattributes
$ rm .git/index     # Remove the index to force Git to
$ git reset         # re-scan the working directory
$ git status        # Show files that will be normalized
$ git add -u
$ git add .gitattributes
$ git commit -m "Introduce end-of-line normalization"

Read the section on "End-of-line conversion" in the gitattributes man page for more details.

Instead of going with the quick fix of just removing that line from .gitattributes, you may want to instead keep the line ending normalization rules and go ahead and normalize them now. That basically just means committing the 20+ changes that won't go away, but you can do so methodically by following the above instructions about introducing line ending normalization (minus editing the .gitattributes), and then feeling confident that it won't happen again, because all the files are now committed with normalized endings, and any future files you add will also be normalized. It's personal preference, mostly.

ComputerDruid
  • 16,897
  • 1
  • 19
  • 28