2

I need to process the output of "git ls-files". If the file-name has a special character I get unusable output:

FRANZÖSISCH.txt -> FRANZ\303\226SISCH.txt
NIEDERLÄNDISCH.txt -> NIEDERL\303\204NDISCH.txt

No matter what encoding i try (I use C#) those values do not convert to the characters "Ö" or "Ä"

What encoding is used here and how can i convert to actual file-names?

Dlanod Kcud
  • 517
  • 5
  • 13

3 Answers3

2

So here's what I found out...

  1. Git uses Octal Escape Sequence: graphemica.com
  2. Convertion is acually very easy: related answer on stackoverflow
Community
  • 1
  • 1
Dlanod Kcud
  • 517
  • 5
  • 13
2

Most reliable way to integrate with git is to use binary protocol with manual encoding to/from utf-8.

In your case of git ls-files you specify option -z to it so that it writes filenames literally as they are (they are stored as utf-8 inside git) and separates them with zero byte.

Then consume with output from binary Stream which is provided by Console.OpenStandardInput, if you pipe git's output to application, or check this answer how to get binary output of child process, if you run git ls-files from your application.

max630
  • 8,762
  • 3
  • 30
  • 55
  • Thank you. What I don't like about git is that this information is so hidden! E.g. for the help on status: -z Terminate entries with NUL, instead of LF. This implies the --porcelain output format if no other format is given. and help on ls-files: -z \0 line termination on output. – Dlanod Kcud Dec 04 '16 at 08:18
0

The encoding for non-ASCII characters like Ö and Ä (and ç and ø and so on) is a bit tricky. Git tries to use UTF-8 here, but there are issues with combining characters. See Git and the Umlaut problem on Mac OS X for details and some workarounds. Given that you are using C#, you are probably on Windows; I'm not sure what Git has to do to keep Windows happy.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775