3

For a large existing repository that contains inconsistent line endings, and file encodings with ascii and UTF-8 (With BOM)...

The key thing is that the current set of files are fairly inconsistent. They vary in encoding. (Lets ignore UTF-16 for now, although I do have a few of those, too). They vary in line endings from file to file, and they vary in line endings within the files themselves too, although I suspect that most of them are stored with crlf line endings in git.

There are two main issues here:

1) Different people using the same repositories can look at changes, and they see a different set of changes. Sometimes the "whole file" has been changed, because of normalised line endings. Sometimes only a part of the file has been changed. This seems to be mostly dependent on whether or not core.autocrlf has been set to true or false, and also seems influenced by the use of a .gitattributes file.

2) I want all people to be able to submit files to the git repository, without having to pay painful attention to whether their particular git configuration has been set to do crlf conversion, or their text editor, IDE, or whatever tools they decided to use. (As broken as this behaviour can be on windows, we need to live with it...)


The main question is this: How do I be sure that the output shown by 'gitk', 'git diff', 'git show', and the like, are absolutely consistent with respect to the changes shown. I am less concerned about line endings here, and more about ensuring that the 'change' for a given commit is the same change as viewed by all developers. I do not want one person looking at a change, and see "all the lines have changed" (That is, the line endings changed), while another person sees the same change, and says: "three lines have changed".

  • Note: Some people use github to view changes.

That said, I want to have confidence in knowing how the line endings are concerned, so I am ultimately asking for how to know what happens with the line endings. If, eg, I specify "eol=crlf" for a given file in .gitattributes, does that mean that the file is committed to git with that setting? And what happens if I check out an earlier version of that file that was committed prior to setting that .gitattributes file?

Arafangion
  • 11,517
  • 1
  • 40
  • 72

2 Answers2

2

Ok, here’s what’s happening:

First: Diffs always look the same and do not depend on local git configuration. You can try that: git diff HEAD^ HEAD will look the same on all your machines (assuming they have the same HEAD).

But why do the diffs look different on your machines then? Assume you have a file in your repo that looks exactly like this:

two \r\n lines

Checked out will look like this on every machine. But on check in there are two options:

  1. Line ending normalization is on. The file will now be checked in as:

    two \n lines
    

    and git diff will report that there is going to be a change

  2. Line ending normalization is off. The file will be checked in as:

    two \r\n lines
    

    and git diff will not report any changes.


Now, how can you make sure that everybody sees the same changes? I would recommend to enable line ending normalization for everyone. To do that create a .gitattributes in the root of your repo with this content:

*   text=auto

And commit this file to every branch. Once everybody has pulled this commit, the diffs will look the same everywhere.


Final note: core.eol does have no effect on this whatsoever. It only changes the line endings in the working directory. git diff does not diff the working directory against the index, but it diffs what would be commited against the index.

Chronial
  • 66,706
  • 14
  • 93
  • 99
1

I assume you'll google "git line endings" to see how to do basic repo setup.

You can't influence anything already committed at all. The only thing you can do is make new commits with any fixed-up file contents you like.

From your comment below, what you're after is being able to completely ignore line-ending differences. See here and here for the best previous stackoverflow answers I could find.

Community
  • 1
  • 1
jthill
  • 55,082
  • 5
  • 77
  • 137
  • jthill: What I don't want is to have files that were previously treated as 'autocrlf=true' in the repository to suddenly be regarded as having had every line changed in commits that have already been made. It seems that whenever a file is stored in git with crlf's (with autocrlf=false), then the diff reported can vary with .gitattribute changes (when the files are now eol=crlf vs eol=lf). (But I might be misinterpreting the signs) – Arafangion Feb 11 '13 at 03:46
  • The 'git reset' command there is used because it is mentioned in http://www.kernel.org/pub/software/scm/git/docs/gitattributes.html – Arafangion Feb 11 '13 at 03:57
  • Finally, when I say: "influence", I mean, "influence the *reporting* of the changes". I know I can't change the actual commits, because that would result in changes to the sha1. – Arafangion Feb 11 '13 at 04:10
  • I (plainly) didn't know that about reset, I took that line out of my answer. But it seems you've already done your research on how to deal with the sloppy commits, do you want git to completely ignore any line-end inconsistencies? – jthill Feb 11 '13 at 04:27
  • Ideally yes, however my first concern is to understand why different users see different output for the apparently set-in-stone historical commits depending on how their autocrlf has been set, and then to fix it for all future commits. It would be a bonus if historical commits also had end-of-line inconsistencies ignored or at the very least, canonicalised. Do note that UTF-8 files with BOM are a complication. – Arafangion Feb 11 '13 at 05:12
  • This is a completely different question than what you started with. If you have files the autodetect is miscategorizing, then tell git explicitly in your .gitattributes. Getting the different commands to treat he specific flavors of newline and other encoding damage in your repo is going to be a matter of taking it case by case, command by command. Have a look at merge's `-Xrenormalize` option, that will help for some of what you mention. – jthill Feb 11 '13 at 05:37
  • Let me reword my question, then - I've since been able to understand a bit more of what must have been happening, so I should be able to clarify the question. When I wrote the question, I didn't really know what 'eol=crlf' meant, but I knew that I wanted *all* the "text" file changes to be consistently viewed by any person regardless of their local git configuration. – Arafangion Feb 11 '13 at 05:52
  • I think I will offer a bounty to recognise your input. :) (In a few days...) – Arafangion Feb 11 '13 at 06:29
  • Bounty offered (Note: I'm the only one who has upvoted so far). :) – Arafangion Feb 13 '13 at 06:39