1

So I have a batch file tool that is originally in English, and I am having translated to various other languages. My situation is that many languages use special characters. In my case, it is German.

So I might have in the English one:

echo Administrative permissions required. Detecting permissions...

Then in the German one, I'd have:

Administratorrechte benötigt. Überprüfe Berechtigungen...

Which uses different types of characters. Now, in my research, I have found the windows command chcp for changing code pages. Now, what I'm trying to do, is change the code page (or any other way of doing this) to allow for these characters to display. My current code page is the one for US English; 437. For German, I believe I need to use 1141 (source). I have read that you can do things like changing the CMD settings, or making more permanent changes via the registry. But I need this to be on demand when a random person runs this file, with minimal effort.

I have tried setting the code page to 1141 by adding chcp 1141 at the start of the batch file, but this causes errors. The batch file cannot understand my commands anymore.

Mofi
  • 46,139
  • 17
  • 80
  • 143
NCSGeek
  • 13
  • 1
  • 4
  • 2
    1141 is EBCDIC. You do not want EBCDIC, ever. – melpomene Feb 26 '18 at 04:19
  • German Windows versions seem to use 850 ("OEM Multilingual Latin 1; Western European (DOS)"). – melpomene Feb 26 '18 at 04:22
  • [better use Unicode instead of those ANSI codepages](https://superuser.com/q/1075297/241386) https://stackoverflow.com/a/40280988/995714 https://stackoverflow.com/q/18813495/995714 https://stackoverflow.com/q/28413489/995714 – phuclv Feb 26 '18 at 04:28
  • @melpomene Also, what is wrong with EBCDIC? I dont know much about codepages. But I can use 850 like you suggested. – NCSGeek Feb 26 '18 at 18:36
  • AFAIK, all single- and multi-byte Windows codepages are ASCII supersets, including UTF-8 (65001). EBCDIC is not an an ASCII superset. – Eryk Sun Feb 27 '18 at 13:33

1 Answers1

5

Windows with a German country configured in Windows region and language settings use OEM code page 850 which is very similar to OEM code page 437. The characters ÄÖÜäöüß have same binary value in both code pages.

Usage of UTF-8 encoding with no BOM (code page 65001) is unfortunately no real option on Windows prior Windows 8 as the default console font is raster font Terminal not supporting Unicode.

A batch file encoded in UTF-8 with no byte order mark with the command lines

@echo off
%SystemRoot%\System32\chcp.com 65001 >nul
echo Es werden Administratorrechte benötigt. Überprüfe Berechtigungen ...

results either in nothing output on Windows XP or on Windows Vista and Windows 7 in getting just displayed the error message:

The system cannot write to the specified device.

The UTF-8 encoded batch file works on Windows 8 / 8.1 / 10 which uses by default the font Consolas supporting Unicode. Thanks eryksun for this additional information.

The Microsoft developers are aware of the issues caused by not really supporting Unicode and are working on improvements of the Windows console, see the developer blog Windows Command-Line: Unicode and UTF-8 Output Text Buffer written by Rich Turner on December 10, 2018.

Mofi
  • 46,139
  • 17
  • 80
  • 143
  • It works fine for me in Windows 10. I'll check Windows 7, but this shouldn't be an issue. After changing the console codepage, CMD decodes the next line as UTF-8. After that it's all the same regardless of the input source. For writing to the console, CMD uses `WriteConsoleW`, as it has since NT 3.1 in 1993. – Eryk Sun Feb 26 '18 at 06:42
  • I'm currently using a German Windows XP with country Austria configured and with OEM 850 set by default for console as well as an English Windows 7 also with country Austria configured and with OEM 850 for console. All the suggestions on using UTF-8 never worked on my two Windows computers until I found out two years ago that as long as the default raster font is configured, usage of UTF-8 is not possible. If I would change once the font for command prompt windows to *Consolas* or *Lucida Console* the UTF-8 encoded batch file would work as expected, but not with default raster font. – Mofi Feb 26 '18 at 06:49
  • Just for verification what is the default font for console I renamed the registry key `HKCU\Console`, restarted Windows 7 Enterprise, clicked on __Start__ button, typed `cmd` and opened the command prompt window, clicked on icon on top left edge of console window, clicked on __Properties__ and switched to tab __Font__. The font selected is *Raster Fonts* which is *Terminal* with 8 screen pixels wide and 12 screen pixels high. The default value for registry value `FaceName` is an empty string resulting in using raster font *Terminal* at least for country Austria configured. – Mofi Feb 26 '18 at 07:01
  • Sorry, I must be misremembering the default console font in Windows 7. I haven't used it except for occasional testing for a few years now, and I haven't used Windows XP at all in many years. But at least in Vista+ you can configure Consolas as the font, which is a huge improvement over the legacy raster font, so I doubt anyone will complain. – Eryk Sun Feb 26 '18 at 07:14
  • @eryksun I agree that *Consolas* is a very good font for console. Unfortunately it is not the default font at least on Windows XP and Windows 7. So on writing a batch file or console application which should output non ASCII characters right for example in German, Austria and Switzerland, it can't be expected that each user has set once the font *Consolas* for console. My experience in my company is that vast majority of Windows users don't know that font and its size is customizable at all. But they all know who to phone or send an email if a batch file/console app does not work as expected. – Mofi Feb 26 '18 at 07:19
  • 1
    I see why `WriteConsoleW` is failing when a raster font is selected. It's misbehaving due to a bad assumption. Because the raster font only supports OEM characters, it tries to accommodate this in a way that no sane program should ever do. It transcodes the Unicode string to the console output codepage with a best-fit mapping via `WideCharToMultiByte` (and then back to Unicode via `RtlMultiByteToUnicodeN`), with a hard-coded assumption that the translation buffer for N Unicode characters is no more than N bytes, but UTF-8 is multibyte, so this fails for non-ASCII characters. – Eryk Sun Feb 26 '18 at 07:22
  • You could save the current codepage; switch to UTF-8 and load string literals in environment variables; and then switch back to the previous codepage. However, in this case, while the console's transcoding game when using a raster font won't *fail* per se, it's arguably worse -- data corruption (mojibake). It's also possible to set a custom font in the registry for a particular window title (i.e. in "HKCU\Console\\[Window Title]"), or in a .LNK shortcut, but that requires the command to be run specially (e.g. `start "Window Title" command`), and always with a new console. – Eryk Sun Feb 26 '18 at 12:32
  • @Mofi Thank you both very much for all of this, I'm trying this out for myself also – NCSGeek Feb 26 '18 at 18:39
  • @eryksun Thanks also (Can only tag one person per reply) – NCSGeek Feb 26 '18 at 18:40
  • @Mofi This was the perfect solution. Thank you very much! This even solves the future translations, as I can re-use that same line of code and just change the codepage number. Kudos to you, friend. – NCSGeek Feb 26 '18 at 18:46