0

I have a git repository that has a UTF-16 file in it. Its only UTF-16 by accident, the file could be encoded in 7-bit ascii without a loss of data. I'd like to use something like reposurgeon to convert the file to UTF-8 so that git diff will work with older revisions of the file and I don't have to resort to git difftool. Is this possible?

Justin Dearing
  • 14,270
  • 22
  • 88
  • 161

1 Answers1

3

Why don't you just covert the file to UTF-8 and commit it, e.g. with:

iconv -f UTF-16 -t UTF-8 input-file.txt > input-file.txt.fixed
# Check here that the conversion worked OK
mv -i input-file.txt.fixed input-file.txt
git commit -m 'Convert input-file.txt from UTF-16 to UTF-8' input-file.txt

Update after a clarifying comment:

If you want to rewrite that file at every commit in the history of HEAD, you can use git filter-branch, something like:

git filter-branch --tree-filter \
    'iconv -f UTF-16 -t UTF-8 input-file.txt > input-file.txt.fixed  &&
     mv input-file.txt.fixed input-file.txt' HEAD

Of course, if you're rewriting history in this way, it may cause problems if you have shared this repository with anyone. (I haven't tested that command - use it with care, probably only a new clone of your repository.)

Mark Longair
  • 446,582
  • 72
  • 411
  • 327
  • Because I want the history in UTF-8 not just the HEAD. – Justin Dearing Oct 16 '11 at 13:08
  • Mark, the short story is that work. The long story is if you want to do it from Windows, you need to launch it from git bash, not from PowerShell. I started mucking about with how do it it with powershell, including using [the Powershell alternative to iconv](http://stackoverflow.com/questions/750449/converting-xml-from-utf-16-to-utf-8-using-powershell) here, but then I realized I could launch git bash. – Justin Dearing Nov 18 '11 at 22:13