2

I'm cleaning up a git repository that is taken over from another development team. There is no need for the new development team to have the original commit hashes (As we all know, after an amended commit in an interactive rebase, every subsequent commit has a different hash.), so I can rebase/cherry-pick/graft/filter-branch... as much as I want until the repository is "clean".

The other team used a mix of Windows and Linux line endings. When on master, I added a new branch, checked out the root commit in the cleanup branch, added a sensible .gitattributes file, and amended the root commit. Now my repository has 2 root commits, as intended.

Next I want to cherry-pick the master branch onto the root commit of the cleanup branch (which will later become the new master branch). I used this command (14c4a129 is obviously the root commit of the master branch):

git cherry-pick 14c4a129..master -Xignore-space-change

However, one quarter through the cherry-pick aborts with

error: Your local changes to the following files would be overwritten by merge:
        (list of files)
Please commit your changes or stash them before you merge.
Aborting

The changes in these files are only whitespace, because git diff -w turns up blank. Another clue: find src -type f -print0 | xargs -0 file | sort | grep 'CRLF' gives me exactly the same list of files as git status.

This occurs when a new developer appears in the git log, who uses different line endings. To be precise, the cherry-pick borks on his second commit. What I assume happend is, his first commit got normalized line endings, but then his next commit got the original CRLF line endings again, which conflict with the committed files, before the next commit could be cherry-picked.

However, I was expecting that -Xignore-space-change would solve this for me, so I wouldn't need to do any manual intervention at all. Apparently I misunderstood.

I also tried with --strategy=recursive --strategy-option=renormalize, same result.

I also tried with an interactive rebase of the master branch, edit (amend) the root commit to add .gitattributes, and keep all subsequent commits - which is essentially the same as the cherry-picking, except that it happens in another branch.

What would be the correct way to solve this with a hypothetical "overwrite-whatever-is-here-dammit" option? I'm also open to answers that involve an interactive rebase of the master branch where I amend only the root commit, because the end result would be identical to a cherry-pick.

  • Answers that do not require any manual intervention after amending the root commit, will be upvoted.
  • Answers that result in manually fixing individual commits, will be downvoted.
  • Comments that result in me further clarifying my question, will be upvoted.
Amedee Van Gasse
  • 7,280
  • 5
  • 55
  • 101
  • 1
    I don't really care that you downvote me, but I think it's important to ask why this is important: "There is no need for the new development team to have the original SHA hashes". Even after doing what you propose (cherry picking), there will be tons of objects that have the same hashes. – jbu Jan 16 '17 at 09:53
  • Objects, yes. Commits, no. As soon as you amend a commit, every subsequent commit has a different hash. Thanks for asking, I will update my question to clarify. – Amedee Van Gasse Jan 16 '17 at 09:55

2 Answers2

0

Don't think you can use interactive mode at all:

--ignore-whitespace --whitespace= These flag are passed to the git apply program (see git-apply[1]) that applies the patch. Incompatible with the --interactive option.

I tried on a simple example, so I'm not sure if this would work for you. Create your cleanup branch (just 1 or 2 commits). Switch to your master branch.

git rebase cleanup --ignore-whitespace

Resolve any merge issues that occur during the process

jbu
  • 15,831
  • 29
  • 82
  • 105
  • Unfortunately this also leads to `error: Your local changes to the following files would be overwritten by merge`, on exactly the same commit. If I resolve that issue, then every commit after that one also needs manual intervention. So it's the same result as I tried before. – Amedee Van Gasse Jan 16 '17 at 10:41
  • good luck, btw, I am wondering if instead of "--ignore-whitespace" you would use the "--whitespace=fix" option. – jbu Jan 16 '17 at 10:47
  • I am now experimenting with a totally different approach: `git filter-branch --tree-filter 'git ls-files -z | xargs -0 dos2unix'`. Still needs some fine tuning because I don't want `dos2unix` to touch binary files like png files. – Amedee Van Gasse Jan 16 '17 at 10:53
  • --strategy-option=ours also looks interesting. – jbu Jan 16 '17 at 10:57
0

Inspired by this Super User question and this Stack Exchange question, I took a totally different approach. Instead of rebase/cherry-pick, I worked on the git objects directly with git filter-branch.

Step 1: filter all commits with dos2unix

I used this command:

time git filter-branch -f --tree-filter 'git ls-files -z | xargs -0 file | grep -e ":.*text" | grep "CRLF" | cut -d":" -f1 | tr "\n" "\0" | xargs -0 dos2unix' --prune-empty -- master

git filter-branch goes over all commits in a given branch, and applies a filter on the each commit. Filters can be shell commands or even external shell scripts. In fact, if you want to do this more than a one-off, I strongly recommend that you do NOT use my one-liner but move the tree-filter into it's own script, or to a function in a script.

To explain the tree-filter:

  • git ls-files: outputs all the filenames of the commit that is currently in the tree-filter
  • -z: null line termination. I had to use this because there were files with spaces and special characters.
  • xargs: build and execute command lines from standard input (piped from the standard output of git ls-files)
  • -0: input is terminated by a null character
  • file: determine file type
  • grep -e ":.*text": print all lines containing the string "text" in the second column, this means all files that are some sort of text file (.html, .css, .js, .java,...). Actually this grep isn't needed because of the next one, but I keep it in there as an extra sanity check.
  • grep CRLF: I only want to work with text files with a CRLF (Windows) line ending. Without this, the result would be the same, but it would be a lot slower because dos2unix would have to work on files that already have Unix line ending.
  • cut -d":" -f1: use the first field, delimited on the : character (which came from the file output; fortunately I had no files with a : character in their file name)
  • tr "\n" "\0": replace newlines with null, because again we might have file names with spaces in them
  • dos2unix: DOS/Windows to Unix text file format converter

This command took a while to complete:

662,44s user 18,82s system 96% cpu 11:47,10 total

Step 2: rebase master branch onto cleanup branch

As described in the question, there is already a cleanup branch, with 1 commit, that contains a .gitattributes file. The cleaned master branch can now be rebased onto it. --ignore-whitespace is no longer required because all whitespace issues were solved in step 1. Just doing this is enough:

git rebase cleanup

No merge errors occured.

Step 3: a bit of housekeeping

Clean up the junk you don't need any more (we made a backup after all, didn't we? Uhmmm... yes?)

git branch -D cleanup
git filter-branch -f --env-filter 'export GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE" && export GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL" && export GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME"' -- master
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --aggressive --prune=now

Step 4: push to remote

git remote add origin $REPOSITORY
git push -u origin master
Community
  • 1
  • 1
Amedee Van Gasse
  • 7,280
  • 5
  • 55
  • 101