I have a git repository that has a UTF-16 file in it. Its only UTF-16 by accident, the file could be encoded in 7-bit ascii without a loss of data. I'd like to use something like reposurgeon to convert the file to UTF-8 so that git diff will work with older revisions of the file and I don't have to resort to git difftool. Is this possible?
Asked
Active
Viewed 322 times
1 Answers
3
Why don't you just covert the file to UTF-8 and commit it, e.g. with:
iconv -f UTF-16 -t UTF-8 input-file.txt > input-file.txt.fixed
# Check here that the conversion worked OK
mv -i input-file.txt.fixed input-file.txt
git commit -m 'Convert input-file.txt from UTF-16 to UTF-8' input-file.txt
Update after a clarifying comment:
If you want to rewrite that file at every commit in the history of HEAD
, you can use git filter-branch
, something like:
git filter-branch --tree-filter \
'iconv -f UTF-16 -t UTF-8 input-file.txt > input-file.txt.fixed &&
mv input-file.txt.fixed input-file.txt' HEAD
Of course, if you're rewriting history in this way, it may cause problems if you have shared this repository with anyone. (I haven't tested that command - use it with care, probably only a new clone of your repository.)

Mark Longair
- 446,582
- 72
- 411
- 327
-
Because I want the history in UTF-8 not just the HEAD. – Justin Dearing Oct 16 '11 at 13:08
-
Mark, the short story is that work. The long story is if you want to do it from Windows, you need to launch it from git bash, not from PowerShell. I started mucking about with how do it it with powershell, including using [the Powershell alternative to iconv](http://stackoverflow.com/questions/750449/converting-xml-from-utf-16-to-utf-8-using-powershell) here, but then I realized I could launch git bash. – Justin Dearing Nov 18 '11 at 22:13