(You can tell what's in the commits, but not the way you're going about it. You'll have to look directly into the commits, using low level tools. In general—but not always—what's in the commits is LF-only.)
You're mixing together some concepts that you need to keep separate. These concepts are commits, which is what Git is really for, and the work-tree and the index, which is how you go about having Git make commits. I'm going to go through all of these pretty fast, because we have to have a lot of shared terminology and understanding before we can get into the details of how CRLF vs LF-only line endings really work.
Commits, branches like master
, and remote-tracking names like origin/master
Remember that Git is all about commits. Each commit has its own unique hash ID. That hash ID is, in effect, the true name of the commit. The commit itself represents a permanent and immutable1 snapshot of a set of files, along with some metadata, such as the name and email address of whoever made the commit, the reason they made it (their log message), and the raw hash ID of the commit's parent commit.
Because each commit records the hash ID of its parent, we can, from any commit, work backwards to its parent. We say that this commit points to its parent. We can draw this situation. If we let a single uppercase letter stand in for a real hash ID (because real hash IDs are too big and ugly for humans to remember and use), we can draw a small simple three-commit repository like this:
A <-B <-C
Here commit C
is the last commit we made. It records the hash ID of its parent commit B
, so that C
points to B
. That allows Git to use the hash ID to find the actual commit B
itself, and B
contains the hash ID of—or points to—commit A
. That allows Git to extract A
. A
is a special case: it's the very first commit, so it has no parent. This lets Git stop working backwards from commit to commit.
Note, though, that we need to save the actual hash ID of C
somewhere. We don't need to save the hash ID of B
because C
is saving it for us, but we have to find C
. Actual hash IDs seem random (even though they're not) so we have to write the hash ID of C
somewhere. We could jot it down on paper, or on a whiteboard, but that's silly: why not have Git save it for us? So that's just what we do. That's what a branch name is: it's a place to save one (1) hash ID.
When we save C
's hash ID in the name master
, we say that master
points to C
:
A <-B <-C <-- master (HEAD)
We can share these commits with another Git. Our Git and their Git will always use the same hash IDs (see footnote 1), so they have the exact same three commits. But they have their own branch names. Their master
is theirs. At the moment, theirs also points to (shared) commit C:
A--B--C <-- master (HEAD) [in their Git]
Our Git calls up their Git and has a conversation. Our Git and their Git realize we both have the same three commits. Then our Git reads their name master
and saves it in our own Git repository, but changes it so that it doesn't interfere with our master:
A--B--C <-- master (HEAD), origin/master
Now let's make a new commit in our own repository. The new commit gets some big ugly hash ID, which is unique to our new commit; we'll call this D
. The special thing about branch names is that when we make a new commit while on some branch, Git writes the new commit's hash ID into the branch name, so that the branch name automatically points to the new commit:
A--B--C <-- origin/master
\
D <-- master (HEAD)
(This HEAD
that I'm drawing in is how Git knows which branch name to update. As long as we only have one branch, we don't really need it, but as soon as we have more than one branch, we will need it.)
Now suppose that someone controlling the other Git repository adds a new commit to their master. This new commit will have a different hash ID from every other commit, so we'll call it E
. Their master
will now point to their E
:
A--B--C--E <-- master (HEAD) [in their Git]
Now we'll have our Git call up their Git and obtain any commits they have that we don't—which in this case is just commit E
—and update our origin/master
, which our Git is using to remember their master
, to point to E
:
A--B--C--E <-- origin/master
\
D <-- master (HEAD)
Let's make two more commits in our own repository now and call them F
and G
:
A--B--C--E <-- origin/master
\
D--F--G <-- master (HEAD)
When git status
tells you that your branch is ahead 3
, this is what it means: we have three commits on our master
that they don't have on their master
(that we're remembering as our origin/master
). When git status
tells you that your branch is behind 1
, this is what it means too: they have one commit on their master
(our origin/master
) that we don't have on our master
.
This is all that git status
means by ahead
or behind
: that we have commits that they don't, or vice versa, or both.
Commits can, in some cases, be forgotten, and eventually they will go away and that hash ID will no longer have any meaning. But until they do go away, the commit is effectively permanent. It's entirely immutable, for the simple reason that the hash ID is a cryptographic checksum of the contents of that commit. If you attempt to change anything—even a single bit—what you get is a new, different commit with a different hash ID. The original commit remains unchanged. So all commits are quite literally immutable.
The index and the work-tree
Commits are immutable. They're frozen forever in time: the snapshots inside each commit can never be changed, not one bit. They're also stored in a special compressed Git-only form, sort of freeze-dried as it were, so as to take less space. That's fine for archiving—it lets you go back and see what you had yesterday, or last week, or whenever—but it's of no use at all in getting any new work done. If you can't change any files, what good is Git? Moreover, if they're all Git-only, how will you ever use them?
Of course, Git lets you make new commits—but to make new commits, you still need to change some files. Well, that, or remove some, or add some new ones, or any combination of these. So Git has to have a way to let you take an existing commit and rehydrate it, getting all its files out into useful form where you can see them and work on them.
The place where you can see and work on your files is the work-tree. When you run git checkout master
, you're telling Git: Get all the files out of the commit to which the name master
points. (This also attaches HEAD
to the name master
, so that Git knows which name to update when you make the new commits.) The extracted files go into your work-tree, where you can see them, use them, change them, and so on.
Git could stop here, and other systems do stop here. The current commit and the work-tree are all you really need. But Git doesn't quite stop here. Instead, in between the current commit, which is read-only and has freeze-dried Git-only files in it, and the work-tree, Git inserts a sort of halfway point that Git calls, variously, the index, or the staging area, or the cache. All three names mean the same thing. Which name gets used depends on who or which part of Git is doing the calling.
What's in the index is, at least initially, all the files from the commit. That is, Git effectively copies the freeze-dried files from the commit, to the index, before copying them on to your work-tree. Then it rehydrates the files, copying from the index to the work-tree.
If you have modified the work-tree copy of a file, you must copy it back into the index in order to commit the result. You do this with git add
, which dehydrates (compresses and Git-ifies) that file and overwrites the previous index copy. When you later run git commit
, Git takes whatever is in the index at that time and puts that into the new commit.
Again, this is all critically important: Git extracts any existing commit into the index and builds a new commit from the index. Git does not build the commit from what's in your work-tree: the work-tree is for you, not for Git. The committed copies of files are in the special Git-only format: freeze-dried, as it were. The index copies of files are also in this special Git-only format. (This is what makes git commit
so fast: it doesn't have to freeze-dry every file; every file is already freeze-dried, ready to go!) The work-tree copies ... well, this is where CRLF and LF-only line endings come in!
We finally get to talk about line endings
Because internal (committed and index) files are in a different format, Git has an opportunity to make special changes. Whenever Git is copying a file from the index to the work-tree, Git can replace the LF-only line endings that Linux prefers with the CRLF line endings that Windows prefers. Whenver Git is copying a file from the work-tree to the index, it can do the reverse. This is precisely how it all works. Nothing happens to any committed file. Nothing can happen to such a file, because commits are immutable. But by changing the conversion settings, you can make what goes into the index, or what comes out of the index, be or look different from what you get to see and work with in your work-tree.
Telling Git: File A.txt
should have CRLF endings in the work-tree tells Git to change LF-only to CRLF on the way out of the index, and CRLF to LF-only on the way from the work-tree into the index. So when git checkout
copies the file to the work-tree (from the index), LF becomes CRLF, and when git add
copies the file from the work-tree (to the index), CRLF becomes LF.
You can tell Git: Don't change A.txt
when copying from index to work-tree, but when copying from work-tree to index, do replace CRLF with LF-only. This is the mode called input
. When git checkout
does the index -> work-tree conversion, it doesn't do anything special, but when git add
does the work-tree -> index conversion, it replaces CRLF with LF-only.
There's a hitch
There is one big problem with this technique. It does work, and that really is how Git does things. But Git was originally built for Linux, where you never want any of this fiddling. Your files are all just data; Git has no business changing them; and Git was designed to work this way. The part of git status
that tells you:
Changes not staged for commit
works by comparing what's in the index and what's in the work-tree. If you're having Git fuss with line endings, those copies won't match up. Git has to pretend that they do match up, as long as it's Git that did the line-ending fiddling, and that's still the only actual difference.
Hence, git status
deliberately lies. If Git made the index and work-tree different due to line-ending settings, git status
will try to tell you that the index and work-tree are the same. This automatic lying does not work in every case. In particular, if you change the conversion settings, Git may, or may not, notice.2 If you change other things—including some of the system time data of the files—Git will think that the files are changed.
In this case, you're seeing the latter effect. You have touched the files in some way, so that Git doesn't just lie and say they are the same. Then you run:
git add .
Git carefully copies the work-tree files back into the index, doing the CRLF-to-LF-only conversion if required. The result is a freeze-dried index copy that matches the HEAD copy. Git now updates the cached system data (stat
data as in footnote 2) in the index, so that git status
knows to print the correct lies, or—if the work-tree copies really are LF-only now—the truth: that the HEAD
copy, the index copy, and the work-tree copy of the file all match.
2The details depend on the internal details of the index, in its cache aspect: it saves the stat
data from the file in the index, and if the stat
data is unchanged since the last index-update, Git assumes the file is unchanged from the way Git set it up.
How can you see what's really in the commit?
There are several ways to see the original data unmolested by any LF-to-CRLF transformation. The most direct is to use git cat-file -p
, which will pretty-print the internal storage form of a file (or of an index freeze-dried file for that matter). For instance:
git cat-file -p HEAD:A.txt
extracts what's really in A.txt
in the current commit.
Note, however, that even your own computer's programs that transcribe this data into a window, so that you can see it, may modify the data. (In a similar vein, on a Linux system, using vim
on a file with CRLF line endings hides the fact that it has CRLF endings from the Unix Linux user. You won't see them—but they'll still be there when you write the file out again!)
You may need a special viewing program that deliberately doesn't make the data "more user friendly", but instead makes it programmer-friendly. For instance, Linux has hexdump -C
:
$ echo foo | hexdump -C
00000000 66 6f 6f 0a |foo.|
00000004
Running the output of git cat-file -p
on a Git internal blob (blobs are how Git freeze-dries files) through hexdump -C
can be useful here. What the Windows equivalent of hexdump -C
might be, I have no idea.