See JWilliams' answer for where to report bugs to GitHub. [Edit: perhaps this should be an answer to your other question.]
For what it's worth, it's not a good idea to use anything other than UTF-8 for the author and committer name encoding—the encoding
field of the header is too difficult to apply to the pre-body part of the header, since it comes at the end of the lines:
>>> import subprocess
>>> p = subprocess.Popen(['git', 'cat-file', '-p', 'HEAD'], stdout=subprocess.PIPE)
>>> o = p.stdout.read()
>>> hdr, body = o.split('\n\n', 1)
>>> hdr = hdr.splitlines()
The header lines are long, even after splitting:
>>> import pprint
>>> pprint.pprint(hdr)
['tree 79036d838fc5ce13e849949d02e6048c2d33c561',
'author \xc5\x99\x89\x83@\xc8\x96\x97\x97\x85\x99 <\x88\x96\x97\x97\x85\x99|\x96\x94\x95\x89\x86\x81\x99\x89\x96\xa4\xa2K\x96\x99\x87> 1528844508 -0700',
'committer \xc5\x99\x89\x83@\xc8\x96\x97\x97\x85\x99 <\x88\x96\x97\x97\x85\x99|\x96\x94\x95\x89\x86\x81\x99\x89\x96\xa4\xa2K\x96\x99\x87> 1528844508 -0700',
'encoding cp037']
but we can see that the encoding comes last. If the encoding were something that had byte-codes that resembled newlines (cp037
doesn't, fortunately) we would not be able to parse the header itself.
For the body, however, it's a good idea to use the header's encoding information. If we work in something that does have the encoding available, well:
>>> body.decode('cp037')
u'Well, this should be interesting.\x8e'
(Python 2.7 here of course).
Obviously neither GitHub nor my Git on this machine can do this for cp037
, but on this particular host, that's not surprising:
$ iconv -f cp037
iconv: conversion from cp037 unsupported
On another machine that has the character set installed, iconv does work. I did not try this commit in Git there, but I did feed a header-line byte string through it; the result was:
>>> import subprocess
>>> p = subprocess.Popen(['iconv', '-f', 'cp037'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
>>> so, se = p.communicate(s)
>>> so
'Eric Hopper\xc2\x80\x14hopper@omnifarious.org\xc2\x9e'
As you can see, the angle brackets have been damaged in translation (because the parse here was overly simple—we'd have to carefully avoid translating them). Again, though, the hazards are obvious: what if the encoding produces >
?