Edit: the remaining problem is that the file modes are apparently not stored properly in Windows systems (see also What is git's "filemode"?). To save and restore them, one will need a script, plus the original data:
git ls-files --stage > /tmp/original
To recover the modes, this rather crude pipeline should work:
< /tmp/original \
awk -F$'\t' '/^100755 / { print "git update-index --chmod=+x \"" $2 "\"" }' |
sh
This will attempt to chmod +x
files that have been removed by the below sequence, so you can expect some error messages if there are any such files. (It also assumes no files have double quotes in their names.)
Assuming you do not already have a .gitattributes
file, here is a six step process that should work:
- Create that
.gitattributes
file just as you did
- Run
rm .git/index
- Run
git checkout HEAD -- .
- Run
git rm -r --cached .
- Run
git add .
- Run
git rm .gitattributes
(you can leave this until after verifying that it all worked). Run git commit
afterward.
I do not have (nor use) Windows so cannot test this, but here's the theory behind why it should work, and hence why there are these steps.
Git's actual data storage format is a special, Git-only, compressed (sometimes highly compressed) format. Files stored in this format are mainly useful only to Git itself. This format stores a raw, uninterpreted byte stream: files do not have to be separated into "text" and "data" and so on, they are just raw byte streams (hence treated as "data" / "non-text"). The data, once stored, are read-only and get assigned a hash ID (currently SHA-1 though a future Git may use SHA-256). Git calls a file stored this way a blob, which is a term stolen from the database world.
Your computer's useful-file-storage format is of course different, and may (and does on Windows) make a distinction between "text" and "data". Text may have encodings (such as ISO-8859-1, UTF-8, UTF-16, and so on). These files are generally both readable and writable and anything on your computer can deal with them (to some degree anyway, depending on encoding).
Git has to extract files from commits, turning them from blobs into files that you can work with. These files live in your work-tree. You work with them, and then git add
them to give Git a chance to re-blob-ize them.
In between these special Git-only blobs and the work-tree, Git needs a place to store the blobbed data, that—unlike a commit—is writable, but that—like a commit—has the file in the special Git-only format. This "in between" place is Git's index. Various bits of Git documentation sometimes call this the staging area or the cache.
Git uses the index copy of each file (or blob, really) to make new commits. When you run git add
, Git reads the work-tree file, encodes it down into the blob form, and saves it—well, its hash ID, really—in the index. When you run git commit
, Git simply freezes the index copies into committed copies.
When you run git checkout
to switch to some commit, Git extracts the commit into the index (filling in all the blob hash IDs), and also extracts the blobs into the work-tree so that they are in useful format and you can work on them. When you run git add
, Git compresses the work-tree file into its blob format and replaces the index entry for the file.
Transforming a blob into a work-tree file, or vice versa, is the ideal place where Git will do any conversions you need, such as turning newlines into CRLF line endings. So that's where Git does it: git checkout
fills the index and expands-and-converts into the work-tree, and git add
compresses-and-un-converts from the work-tree into the index, ready for the next git commit
. (Any files you don't touch, stay compressed and ready to go, safely tucked away in the index.)
You already know that a tracked file is one that is in the index, and an untracked file is one that is in the work-tree but not in the index. Your goal is to use the existing .gitignore
to make files that are currently in the index go away from the index if they would be .gitignore
-ed. The process you are using is:
git rm -r --cached .
: remove everything from the index, so that the entire work-tree is untracked
git add .
: produce all new blobs in the index from whatever is in the work-tree, while ignoring any file that is listed in .gitignore
.
The issue here is that what's in the work-tree has been converted by the "blob to work-tree" conversions, and will be "un-converted" by the "work-tree to blob" conversions. Creating a .gitattributes
file with * -text
tells Git: The conversions to do are no conversions at all."
Unfortunately, it's too late: the git checkout
you ran earlier, to get this commit into the work-tree, already did some conversions.
So here, we use step 1 to create a .gitattributes
file that says do no conversions. Step 2, rm .git/index
, removes the index entirely. Git now has no idea what's actually in the work-tree. This step may be unnecessary but I use it to force Git to act in step 3, which tells Git: extract every file from the HEAD
commit into the index and the work-tree. This re-creates the index, and re-fills the work-tree, this time doing no conversions.
Steps 4 and 5 are just as before, but this time, the work-tree files all match the blobs in the HEAD
commit since step 3 operated with the .gitattributes
directive in place. Step 6 is to make sure you do not commit the "do no conversions" directive.