44

When creating a diff patch with Git Shell in Windows (when using GitHub for Windows), the character encoding of the patch will be UCS-2 Little Endian according to Notepad++ (see the screenshots below).

How can I change this behavior, and force git to create patches with ANSI or UTF-8 without BOM character encoding?

It causes a problem because UCS-2 Little Endian encoded patches can not be applied, I have to manually convert it to ANSI. If I don't, I get "fatal: unrecognized input" error.

Creating git patch

Notepad++ screenshot of the character encoding


Since then, I also realized that I have to manually convert the EOL from Windows format (\r\n) to UNIX (\n) in Notepad++ (Edit > EOL Conversion > UNIX). If I don't do this, I get "trailing whitespace" error (even if all the whitespaces are trimmed: "TextFX" > "TextFX Edit" > "Trim Trailing Spaces").

So, the steps I need to do for the patch to be applied:

  1. create patch (here is the result)
  2. convert character encoding to ANSI
  3. EOL conversion to UNIX format
  4. apply patch

Please, take a look at this screenshot:

Applying a patch in Windows Powershell with Git is problematic

Sk8erPeter
  • 6,899
  • 9
  • 48
  • 67
  • 1
    This is not a direct answer to this question, however the canonical way to create patch for application, not only human consumption is not `git diff`, but `git format-patch` -- as this does not output to stdout by default, I guess you won't have a problem with mangled character encodings. – Lars Noschinski Dec 06 '12 at 13:12
  • @cebewee: thanks, and how should I use that in this case? So how should I define the output file's name? – Sk8erPeter Dec 06 '12 at 17:22
  • `git format-patch` gets a single commit X (meaning HEAD..X) or a commit range and produces files for each of these commits, named NNNN-SUBJECT.patch, where NNNN is an increasing number and SUBJECt is a mangled version of the subject of the commit. – Lars Noschinski Dec 06 '12 at 19:21
  • Related posts: http://stackoverflow.com/questions/4481746/how-to-capture-binary-stdout-of-a-console-exe-run-from-powershell and http://superuser.com/questions/327492/default-powershell-to-emitting-utf-8-instead-of-utf-16 – Lars Noschinski Dec 06 '12 at 20:52
  • You may use custom attributes and a custom filter: http://git-scm.com/book/en/Customizing-Git-Git-Attributes – mbx Jun 12 '14 at 08:07

6 Answers6

20

I'm not a Windows user, so take my answer with a grain of salt. According to the Windows PowerShell Cookbook, PowerShell preprocesses the output of git diff, splitting it in lines. Documentation of the Out-File Cmdlet suggests, that > is the same as | Out-File without parameters. We also find this comment in the PowerShell documentation:

The results of using the Out-File cmdlet may not be what you expect if you are used to traditional output redirection. To understand its behavior, you must be aware of the context in which the Out-File cmdlet operates.

By default, the Out-File cmdlet creates a Unicode file. This is the best default in the long run, but it means that tools that expect ASCII files will not work correctly with the default output format. You can change the default output format to ASCII by using the Encoding parameter:

[...]

Out-file formats file contents to look like console output. This causes the output to be truncated just as it is in a console window in most circumstances. [...]

To get output that does not force line wraps to match the screen width, you can use the Width parameter to specify line width.

So, apparently it is not Git which chooses the character encoding, but Out-File. This suggests a) that PowerShell redirection really should only be used for text and b) that

| Out-File -encoding ASCII -Width 2147483647 my.patch

will avoid the encoding problems. However, this still does not solve the problem with Windows vs. Unix line-endings . There are Cmdlets (see the PowerShell Community Extensions) to do conversion of line-endings.

However, all this recoding does not increase my confidence in a patch (which has no encoding itself, but is just a string of bytes). The aforementioned Cookbook contains a script Invoke-BinaryProcess, which can be used redirect the output of a command unmodified.

To sidestep this whole issue, an alternative would be to use git format-patch instead of git diff. format-patch writes directly to a file (and not to stdout), so its output is not recoded. However, it can only create patches from commits, not arbitrary diffs.

format-patch takes a commit range (e.g. master^10..master^5) or a single commit (e.g. X, meaning X..HEAD) and creates patch files of the form NNNN-SUBJECT.patch, where NNNN is an increasing 4-digit number and subject is the (mangled) subject of the patch. An output directory can be specified with -o.

Ian Kemp
  • 28,293
  • 19
  • 112
  • 138
Lars Noschinski
  • 3,667
  • 16
  • 29
  • The call for using Invoke-BinaryProcess should be along the lines of: `Invoke-BinaryProcess git -RedirectOutput diff | Out-File -encoding OEM my.patch` – Lars Noschinski Dec 06 '12 at 20:50
  • 1
    Thank you very much for this nicely detailed answer. _"However, it can only create patches from commits, not arbitrary diffs."_ Yes, unfortunately this means a problem when trying to create Drupal patches as suggestions for being committed. As you suggested, I tried Out-File pipe with defining the encoding as ASCII: http://i.imgur.com/2Nx9Z.png; here's the one with the default character encoding (using `>` for output redirection): http://i.imgur.com/QdyAN.png, and here's the one with your solution: http://i.imgur.com/7Fpz0.png, and yeah, the encoding is ANSI now. :) – Sk8erPeter Dec 09 '12 at 15:10
  • 1
    And yes, EOL is still a problem (CRLF vs. LF). I'll try `Invoke-BinaryProcess` as soon as I'll have time. I also found that I can mess with Git shell's `core.autocrlf` and `core.eol` settings: http://stackoverflow.com/a/1250133/517705, http://stackoverflow.com/a/10855862/517705, http://getmoai.com/news/normalized-line-endings-in-git.html, http://wiki.opf-labs.org/display/SP/Configuring+how+line-endings+are+handled+by+git, https://help.github.com/articles/dealing-with-line-endings . Anyway, thanks for your answer, yours is accepted, and you deserve the bounty for your efforts. ;) – Sk8erPeter Dec 09 '12 at 15:14
15

If you use powershell you can also just do:

cmd /c "git diff > patch.diff"

This makes command to be run through CMD which writes to output file as is.

ddiukariev
  • 173
  • 2
  • 4
5

In case this helps anyone, using the good old Command Prompt instead of PowerShell works flawlessly; it doesn't seem to suffer from any of the issues present in PowerShell in regards to character encoding and EOLs.

enter image description here

Daniel Liuzzi
  • 16,807
  • 8
  • 52
  • 57
  • I reommend `git-bash` when using Git for Windows as an alternative shell. In my experience git-bash causes less issues compared to PowerShell and does not have the downsides of cmd (e.g. history of commands). – mihca Aug 31 '21 at 09:54
2

Doing dos2unix on the diff generated on powershell seems to do the trick for me. I was then able to apply the diff successfully.

dos2unix.exe diff_file
git apply diff_file
Vish
  • 21
  • 2
1
  1. Iconv output of diffs
  2. For plain-7bit patches (pure English) you can ignore crazy Notepad++ detection: patch-content doesn't contain any charset-definition
Lazy Badger
  • 94,711
  • 9
  • 78
  • 110
  • thanks for your answer,BUT 1.) why would `iconv`-ing be different than converting the character encoding manually in Notepad++? 2.) Why would ignoring the character encoding solve the problem that *these UCS2 Little Endian-encoded patches can NOT be applied?* Please, take a look at this screenshot: http://i.imgur.com/2698k.png. As you can see, I get the error message _"fatal: unrecognized input"_. I also recognized that I have to manually convert the EOL from Windows type to UNIX in Notepad++. As soon as I convert the patch to ANSI and convert EOL to UNIX, the problem goes away...interesting. – Sk8erPeter Dec 02 '12 at 11:47
  • Please take a look at my edited question, I put some more screenshots in! Thanks! – Sk8erPeter Dec 02 '12 at 11:55
  • Well, I haven't idea in trhis case, you may ask local git-fanboys. You try the same operations with another Git-client (they *can* operate differently) – Lazy Badger Dec 03 '12 at 06:37
0

As mentioned by Lars Noschinski you need to fix the Output of Out-File. You can set the DefaultParameter of Out-File using the following commands.

$PSDefaultParameterValues['Out-File:Encoding'] = 'ASCII'
$PSDefaultParameterValues['Out-File:Width'] = '2147483647'

After setting the Default parameters you can use the > to export a patch file.

After adding those two lines to my Profile file everything works as expected.

λ git stash show -p > test3
C:\Users\..\Source\.. [master +1 ~0 -0 !]
λ git apply test3
C:\Users\..\Source\.. [master +1 ~2 -0 !]
quadroid
  • 8,444
  • 6
  • 49
  • 82