40

I have a file with Swedish characters in it (åäö) encoded with UTF8.

If I cat the file it displays fine, but if I do git diff the special characters are printed, for example, as <F6>.

Example git diff output:

-            name: 'Magler<F6>d, S<F6>der<E5>sen',

What I wanted to see:

-            name: 'Magleröd, Söderåsen',

I found another question related to git and encoding problems: git, msysgit, accents, utf-8, the definitive answers It says all problems should be fixed in git version 1.7.10. I have version 1.8.1.2

What can I do to make git diff properly display åäö?

Community
  • 1
  • 1
Tobbe
  • 3,282
  • 6
  • 41
  • 53
  • 4
    Are you sure your file is UTF-8? `xF6` is the ISO-8859-1 code for `ö`, and `xE5` is the code for `å`. – matt Oct 17 '13 at 19:29
  • 1
    `file -bi filename.txt` gives me text/plain; charset=utf-8 – Tobbe Oct 17 '13 at 19:44
  • 3
    @Tobbe I suspect `file` is simply noticing that it's not ASCII, and not doing any extensive testing to verify that's a valid UTF-8 file (which it wouldn't be if the actual byte values are 0xf6 and 0xe5 as matt suggests, because the bytes immediately following do not have bit 7 set, which would be required to be valid UTF-8 code points). `file` may just "guess" at UTF-8 - I'm sure it's not looping through all available encodings and testing... – twalberg Oct 17 '13 at 19:56

3 Answers3

78

git is dumping out raw bytes. In this case, it doesn't care what your file's encoding is. The highlighted <F6> you're seeing is coming from less, which is presumably configured as your PAGER. Try setting:

LESSCHARSET=UTF-8
Edward Thomson
  • 74,857
  • 14
  • 158
  • 187
  • 8
    FYI, if you want to make the change permanent (instead of having to use the command `export LESSCHARSET=utf-8` every time you log into your machine), just add it to your `~/.bashrc` file – DiegoDD Nov 24 '16 at 16:25
  • 5
    Another option is to set the env var in your `~/.gitconfig` or `.git/config`. In my case, I have `pager = LESSCHARSET=utf-8 less -R` in the `[core]` section. – dbort Aug 14 '17 at 21:25
  • For anyone running Docker: remember you'll want to use `ENV` inside of a Dockerfile to set the environment variable system-wide. i.e.: `ENV LESSCHARSET=utf-8`. – Fabien Snauwaert Aug 24 '17 at 07:22
  • 1
    Another possiblity, if you are using powershell, is to set [Console]::OutputEncoding = [System.Text.Encoding]::UTF8 as explained here: https://stackoverflow.com/questions/52205297/the-output-of-git-diff-is-not-handled-correctly-in-powershell – Andreas Mar 17 '19 at 20:00
4

@matt and @twalberg were correct. The file wasn't actually UTF-8 encoded. Trying to figure this out wasn't helped by the fact that my terminal (hterm) can't input åäö properly (but it can display and copy/paste them)...

iconv -f ISO-8859-1 -t UTF-8 in.txt > out_utf-8.txt

solved my issue

Tobbe
  • 3,282
  • 6
  • 41
  • 53
  • 2
    You can actually view the diff without writing to an intermediate file, even though the command line is a bit verbose: `git diff --color | iconv -f iso-8859-1 -t utf8 | less -r` where `--color` forces `git` to output ascii color into the pipe and `-r` forces `less` to stream those color escapes to the terminal. – SnakE May 01 '16 at 12:07
  • In case it happens to someone else, I tried the command, but iconv told me it doesn't know the utf8 encoding, getting the encoding list with iconv -l, I found the encoding with the name: utf-8, so the command for me was: ```git diff --color | iconv -f iso-8859-1 -t utf-8 | less -r``` – FcoJavier99 Aug 16 '20 at 16:06
3

git log will be opened by less not vi.

So you should set lang to less.

$ export LESSCHARSET=utf-8 && git log

kujiy
  • 5,833
  • 1
  • 29
  • 35