11

I'm on Windows and have systemwide core.autocrlf=true.

For a specific repository, I've overridden it locally to false.

But that didn't convert line endings in checked-out files. How do I do that?

  • If I convert the files manually with e.g. dos2unix, they show as altered.
  • Also tried git checkout --force HEAD, it had no effect.

The only working way I have found is to delete all the files, then git reset --hard which is rather awkward (=there's no simple and reliable command to do that, and it does lots of unnecessary work -- everything is recreated from scratch rather than just overwriting the files that need to be converted).

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152

2 Answers2

9

TL;DR

These are three possible solutions (not necessarily the only three).

  1. Use:

    git add --renormalize .
    

(done in the top level of the repository, once). This requires a newer Git, but is the simplest method.

Note: it's not at all clear to me whether this affects the work-tree versions; you might still need git checkout -- . to re-copy from index to work-tree.

  1. For each file that git status is complaining about: rm file; git checkout -- file. The rm removes the work-tree copy so that git checkout must actually re-extract the file according to the new line-ending rules.

You can simplify this somewhat with git rm -r .; git checkout HEAD -- . (just two commands) but this has the side effect of touching all the files in the work-tree, even any files with no changes needed (files that have no carriage-returns in them).

  1. Use dos2unix as you have been, then run git add on the files (or on .). Despite appearances, this should leave the index unchanged.

In all cases, afterward, git status should say nothing to commit, working tree clean.

Long

This is not quite a duplicate of Git: how to renormalize line endings in all files in all revisions?, as you don't want to re-copy a bunch of existing commits. However, the git add --renormalize answer there should work.

Or, if that fails or if your Git is too old to have the --renormalize option:

If I convert the files manually with e.g. dos2unix, they show as altered.

You can convert the files manually, then git add ., or remove the work-tree copies and git checkout them again. The git checkout --force HEAD failed because Git was too smart for its own good: it saw (incorrectly) that the work-tree copy was already correct and avoided doing work on it.

What's going on here

There are, at all times, three active copies of each file. Let's say you have a README.txt and a prog.cc, both of which have CRLF endings in your work-tree, but LF-only line endings in the repository.

   HEAD          index       work-tree
----------    ----------    ----------
README.txt    README.txt    README.txt
prog.cc       prog.cc       prog.cc

The copy in the commit is sacrosanct, inviolable, frozen forever (or as long as that commit exists) in whatever form it has there. (I'm assuming for now that each of these files has LF-style line endings.) It's compressed, too.

The copy in the index is writable, but initially matches the copy in the commit. So it will also have LF-only line endings too. It's compressed, too (it's actually just a reference to the committed copy, at first).

The copy in the work-tree is uncompressed and has the line endings you told Git to use through your .gitattributes file (none) and your core.autocrlf and core.eol and so on. You had them set to change LF to CRLF, so the copies in your work-tree have CRLF endings at the moment.

Now—after the checkout—you change your settings, so that files that get checked-out will have LF-only line endings, or will preserve what's in the index. Unfortunately, one of the entries in each index copy of the file is information about the work-tree copy. This makes Git assume that the work-tree copy is the same as the index copy.

Clearly, since the work-tree copy has CRLF endings while the index copy has LF-only endings, the two are different. But if you had not changed your end-of-line settings, git status is required to say otherwise, so it has to make this assumption.

If you hadn't changed the EOL settings, git status would say nothing and this would bother no one, because if you ran git add on, say, README.txt, that would copy the work-tree copy back into the index. Along the way this would turn CRLF line endings into LF-only line endings, and re-compress the file. The resulting file would match the HEAD copy, and git status would have to say nothing.

But you did change the EOL settings, so if you ran git add now, Git should copy the CRLF ending into the index. Essentially, git status has been fooled: the index says—on purpose!—that the work-tree copy matches (even though it doesn't), and running git add while the work-tree copy has CRLF line endings would change the index copy.

If you use dos2unix on the file to change the work-tree copy, Git now sees that the work-tree copy's statistics don't match the index's saved "this file is clean" statistics. That is, git status remains fooled but now says that the work-tree copy is different! If you git add the file now, Git will keep the LF-only line endings while updating the index copy. The end result will be that the index copy matches the HEAD copy after all, and that Git updates the cached work-tree statistics about the file so that it knows that the index copy matches the work-tree copy.

Essentially, after changing line-ending settings—in .gitattributes and/or core.* variables—you must have Git fix the index's "clean/dirty" cache data. Until git add --renormalize the only way to do that was to force Git to copy from index to work-tree:

rm worktreefile
git checkout -- worktreefile

or force Git to copy from work-tree to index:

git add worktreefile

both of which fix up the index's cache data, but obviously do a bit of additional violence in the process.

Note that if the committed HEAD copy has CRLF endings, things change

Suppose that the committed copy of README.txt has CRLF endings. Then, initially:

  • the index copy matches the HEAD copy as usual, so it has CRLF endings;
  • with CRLF endings in the work-tree, all three copies match;
  • but if you select LF-only endings in the work-tree, and make that happen, the work-tree copy differs from both HEAD and index.

This is true regardless of whether git status is fooled.

Once you copy the work-tree's LF-only line endings into the index such that the index also has LF-only line endings, now the index copy ("staged for commit") differs from the HEAD copy. At this point, if you make a new commit, that commit will have LF-only line endings, and you'll be in the state we described earlier.

torek
  • 448,244
  • 59
  • 642
  • 775
  • *That is, git status remains fooled but now says that the work-tree copy is different!* Does not `git status` automatically rematch index and work tree files in that case? – user4003407 Dec 02 '18 at 21:14
  • @PetSerAl: unfortuantely, no: `git status` doesn't notice that the filters and/or end-of-line controls have changed. This general rule applies to all filters, not just built-in EOL settings. For instance if you add an `ident` filter in `.gitattributes`, the work-tree and index file will deliberately differ (in a controlled way) once the filters run, but until the filters run, `git status` is still set on the *old* contents. That's usually OK since usually we'll just `rm` the file and use `git checkout --` to re-create it with the ident expansion. – torek Dec 03 '18 at 00:40
  • I am not talking about filters and end-of-line controls, but about this: *If you use dos2unix on the file to change the work-tree copy, Git now sees that the work-tree copy's statistics don't match the index's saved "this file is clean" statistics.* In general, if cached stat info does not match work tree stat info, then does `git status` blindly report file as modified and not check actual file content? – user4003407 Dec 03 '18 at 03:12
  • @PetSerAl: yes. If the size differs from the cached size, Git assumes the file content differs. The rules for mtime are a little different since some file systems report full POSIX ts_nsec values and some don't, and Git will at least sometimes compare contents (and sometimes update cached stat data as well), but if the *size* changed from what Git expected, the file must by definition be different, and there's no need to read the actual data. – torek Dec 03 '18 at 05:40
  • Please include the critical parts of the answer here, even if they already in another post on SO. Otherwise, this is a NAA as a link-only answer as it doesn't answer the question but only gives some tangential info. – ivan_pozdeev Dec 03 '18 at 17:45
  • I'm also suspicious about anything that starts with `git add`. Wouldn't this affect a further commit? I don't want to touch changes staged for commit, I only need to update the worktree! – ivan_pozdeev Dec 03 '18 at 17:47
  • 1
    `git add` copies the files back into the index, from the work-tree. This applies all the filters. The filters *used* to be: *change CRLF to LF-only*, and are now *do nothing*. The copied-back-into-index files will therefore match the files *already in the index*. Copying data that matches previous data leaves the previous data unchanged, but has the side effect of updating the cache info that's *also* in the index. – torek Dec 03 '18 at 18:03
  • I just found out that `git add --renormalize .` doesn't work as advertized. After changing `autocrlf` `true`->`false`, it doesn't convert files but marks them as modified, so it takes a `git reset --hard` to actually convert. After changing `false`->`true`, it has no effect whatsoever. – ivan_pozdeev May 06 '19 at 04:42
  • 1
    @ivan_pozdeev: remember that `git add` in general is about copying *to* the index (apparently, even if renormalizing). I guess what this means is that it doesn't copy back *from* the index to the work-tree afterwards, so you'll need a `git checkout` from index to work-tree to update the work-tree version. – torek May 06 '19 at 05:15
  • Ultimately, nothing involving `git add` proved to work, see my answer. Reading the docs more carefully, `git add --renormalize .` followed by `git commit` is intended to fix CRLFs that have been erroneously saved into the repo. While my task is different. – ivan_pozdeev Mar 18 '23 at 01:48
2
git read-tree --empty
git reset --hard

Note: this will overwrite all the files. On the upside, this is easier than removing everything except .git before a git reset and dealing with the recycle bin.

Note: if you're in Windows or are using files produced in Windows and the project is not Windows-exclusive, set core.autocrlf to input rather than false. This will prevent you from accidentally committing CRLFs into the repo proper.

I couldn't find a way to update the index from the tree ('tree' in Git means a revision saved in the repo) and/or the worktree from the index selectively. Neither git read-tree HEAD nor git checkout-index seem to recognize differences from EOL conversion.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152