2

Usually when I rename/move a file in git, git extensions shows it as a rename operation after the file is staged. Other SO questions like this one indicate that this should happen automatically regardless of whether I use git mv or regular moves. However, I did a large-scale reorganization today in preparation for a git subtree split (basically there was a Src folder that I split into Core and Main, and for the record I used Windows Explorer to move the folders, not git mv). When I staged the changes (only a few project files changed, none of the source code), almost all of the changes appeared as add+remove rather than renames. I hoped this was just a glitch in Git Extensions, but when I pushed to github it wasn't any beter. You can see the mess here:

https://github.com/qwertie/Loyc/commit/3eb4bd9dbe3d0023858659cb96e860921f0819e3

After I performed a local git subtree split --prefix=Core -b core and pushed to another local repo, it appeared that the history of all files in the Core folder had been lost.

What's going on? Is there a way to preserve the history of all these moved files?

> git version
git version 1.8.3.msysgit.0
Community
  • 1
  • 1
Qwertie
  • 16,354
  • 20
  • 105
  • 148

1 Answers1

3

Given that you're using Windows, my first suspicion is that something (does not matter what, except for trying to stop it) turned newlines into CRLFs or vice versa, so that every line really IS different. I was able to clone the URL and sure enough, something modified CRLFs:

$ git show HEAD^ | vis | head -40
[snip]
diff --git a/Src/Baadia/ArrowheadControl.cs b/Baadia/ArrowheadControl.cs
similarity index 94%
rename from Src/Baadia/ArrowheadControl.cs
rename to Baadia/ArrowheadControl.cs
index 2e7aa5c..cdb374f 100644
--- a/Src/Baadia/ArrowheadControl.cs
+++ b/Baadia/ArrowheadControl.cs
@@ -1,18 +1,18 @@
-\M-o\M-;\M-?using System;\^M
-using System.Collections.Generic;\^M
-using System.ComponentModel;\^M
[snip]
+\M-o\M-;\M-?using System;
+using System.Collections.Generic;
[snip]

(The \M-o etc is a byte order marker, which is unchanged. It's the removal of the \^Ms, carriage returns, at the end of each line that has git convinced it's not just a simple rename.)


Edit: now that the source of the CR-LF vs LF has been (sort of) tracked down and you want to "insert" a commit that just does the de-CR-ing...

Let's say (based on what I saw when I cloned) you have this sequence of three commits:

... - A - B - C   <-- branch

where A is the commit that has CRLFs, B is the commit that has all the files renamed and also the CRLF to LF transition, and C is the tip of branch branch.

First, you want to extract commit A, and get into "detached HEAD" mode. That's easy:

git checkout branch~2  # branch = C, branch~1 = B, branch~2 = A

Next, you want to clean up all the files to remove CRs while leaving LFs. On a Unix-like box you might use dos2unix or whatever, but let's say the zoop command does it recursively with -R:

zoop -R .    # I'm assuming you're at the top of your work tree

Now commit the result:

git commit -am 'CRLF -> LF only' # or whatever message

The commit graph now looks like this:

... - A - B - C   <-- branch
        \
          A2      <-- HEAD

Now you just want to make the work-tree and index look like commit B, which we can do with two commands:

git rm -rf .; git checkout branch~1 -- .

The first git command empties the tree and index completely and the second re-populates index and tree from commit branch~1, which is to say commit B. (Note that this form of git checkout does not change branches, it merely extracts files. Being at the top of the repository, we extract the file ., which recursively extracts all files.) Commit the result using the log message from B:

git commit -C branch~1

giving:

... - A - B - C     <-- branch
        \
          A2 - B'   <-- HEAD

The tree for commit B' matches that for B, as does the message; only the parent-ID (and some time stamps) is (are) different.

Repeat steps-for-B for commit C, name-able as branch this time instead of branch~1.

When all done, move branch branch to point to commit C', as named by HEAD:

git update-ref -m "move to rewritten history" refs/heads/branch HEAD

or:

git branch -f branch HEAD

(this won't let you specify a custom message), then git checkout branch to get back on it, abandoning the old commit C to the reflogs.

(You might be able to use git cherry-pick to copy commits B and C, but that will probably be slower, and will fail if it's confused by the CRLF -> LF change.)

When it comes time to git push the result, you will have to use a force-push, since updating branch on github will not be a fast-forward operation (will abandon the old B and C commits).

torek
  • 448,244
  • 59
  • 642
  • 775
  • Strange `github` seems to recognize this behavior: it puts `Src/Baadia/ArrowheadControl.cs → Baadia/ArrowheadControl.cs`, etc. in the header of the files. – Willem Van Onsem Aug 11 '14 at 00:34
  • That particular file is detected as a rename, with 94% similarity. But every line is changed. Other files might fall below the default 50% similarity index and not get recognized. – torek Aug 11 '14 at 00:40
  • Interesting find! I don't believe the the files changed in reality; they had `\r\n` line endings before and still have them now. Somewhere there must be a setting that changes the line endings to `\n`, but in a screwy way so that git sometimes sees one kind of line ending and sometimes sees another... – Qwertie Aug 11 '14 at 03:54
  • Now it becomes a matter of figuring out what's doing this (msysgit itself?). This reminds me of why I get annoyed when vim auto-adjusts CRLF-vs-LF ... if I wanted that done I'd do it myself :-) – torek Aug 11 '14 at 04:09
  • I'm pretty baffled. On both of my main PCs here, the `autocrlf = true` option is set in the global config file (with no local `autocrlf` setting). This should mean LF->CRLF on checkout, CRLF->LF on checkin. So it is not surprising that the CRLFs were replaced with LFs. What *is* surprising is that this usually *did not happen*. Using a special clone of the repo with `autocrlf = true` I learned that a commit from *yesterday* on *this PC* has CRLF line endings stored in the repo. Which makes no sense because `autocrlf = true` so the CRLFs should have been converted to LFs. Argh! Well..now what? – Qwertie Aug 11 '14 at 04:54
  • (I meant to say: "Using a special clone of the repo with `autocrlf = false`..."; I used `git config core.autocrlf false`, `git rm --cached -r .`, `git reset --hard` to ensure that I was looking at the files "the way they really are".) – Qwertie Aug 11 '14 at 05:02
  • OK since (A) I don't want to repeat the reorganization work I already did and (B) I actually prefer storing LF line endings in the repo, I believe what I want to do is to insert a "fake" commit JUST BEFORE the reorganization commit. This fake commit will change all line endings from CRLF to plain LF, which in turn will allow git to understand that the moved files are really moved (not rm+add). http://stackoverflow.com/questions/1510798 tells me how to normalize the line endings, but I'm not sure how to insert a fake commit. Thx for the help btw! – Qwertie Aug 11 '14 at 05:50
  • You can't really insert a commit like that but you can fake it (I guess that's appropriate). You'll be doing the dreaded "rewrite history" thing. I'll outline... – torek Aug 11 '14 at 05:55
  • I used `find . -type f \( -name "*.cs" -or -name "*.les" \) \! -path \*/\.git/\* -exec dos2unix --verbose --d2u {} \;` to change line endings (Git comes with bash + unix utils though they run very slowly wtf). Your instructions worked (result: http://loyc.net/res/rewrote-history.png) but sadly the `subtree split` **still** failed to preserve history (https://github.com/qwertie/Loyc/blob/gh-pages/res/split-lost-history.png). I am giving up now. Who needs history anyway right?.... but thanks very much for the help! – Qwertie Aug 11 '14 at 18:20