1

What is the secret to japanese characters in a Windows XP .bat file?

We have a script for open a file off disk in kiosk mode:

@ECHO OFF
"%ProgramFiles%\Internet Explorer\iexplore.exe" –K "%CD%\XYZ.htm"

It works fine when the OS is english, and it works fine for the japanese OS when XYZ is made up of english characters, but when XYZ is made up of japanese characters, they are getting mangled into gibberish by the time IE tries to find the file.

If the batch file is saved as Unicode or Unicode big endian the script wont even run.

I have tried various ways of encoding the japanese characters. ampersand escape does not work (〹)

Percent escape does not work %xx%xx%xx

ABC works, AB%43 becomes AB3 in the error message, so it looks like the percent escape is trying to do parameter substitution. This is confirmed because %043 puts in the name of the script !

One thing that does work is pasting the ja characters into a command prompt.

@ECHO OFF
CD "%ProgramFiles%\Internet Explorer\"
Set /p URL ="file to open: "
start iexplore.exe –K %URL%

This tells me that iexplore.exe will accept and parse the parameter correctly when it has ja characters, but not when they are written into the script.

So it would be nice to know what the secret may be to getting the parameter into IE successfully via the batch file, as opposed to via the clipboard and an environment variable.

Any suggestions greatly appreciated !

best regards

Richard Collins

P.S. another post has has made this suggestion, which i am yet to follow up:

You might have more luck in cmd.exe if you opened it in UNICODE mode. Use "cmd /U".

Batch renaming of files with international chars on Windows XP

I will need to find out if this can be from inside the script.

Community
  • 1
  • 1
Richard Collins
  • 85
  • 1
  • 11

4 Answers4

4

For the record, a simple answer has been found for this question.

If the batch file is saved as ANSI - it works !

Richard Collins
  • 85
  • 1
  • 11
  • Erm, just one question: What were you trying to save it in before? – Joey Oct 02 '09 at 05:14
  • our software wrote it as UTF-B by default. DotNet will write it out as ANSI by adding a System.Text.Encoder.Default parameter to the stream reader and writer constructors. – Richard Collins Oct 02 '09 at 07:27
3

First of all: Batch files are pretty limited in their internationalization support. There is no direct way of telling cmd what codepage a batch file is in. UTF-16 is out anyway, since cmd won't even parse that.

I have detailed an option in my answer to the following question:

which might be helpful for your needs.

In principle it boils down to the following:

  • Use an encoding which has single-byte mappings for ASCII
  • Put a chcp ... at the start of the batch file
  • Use the set codepage for the rest of the file

You can use codepage 65001, which is UTF-8 but make sure that your file doesn't include the U+FEFF character at the start (used as byte-order mark in UTF-16 and UTF-32 and sometimes used as marker for UTF-8 files as well). Otherwise the first command in the file will produce an error message.

So just use the following:

echo off
chcp 65001
"%ProgramFiles%\Internet Explorer\iexplore.exe" –K "%CD%\XYZ.htm"

and save it as UTF-8 without BOM (Note: Notepad won't allow you to do that) and it should work.


cmd /u won't do anything here, that advice is pretty much bogus. The /U switch only specifies that Unicode will be used for redirection of input and output (and piping). It has nothing to do with the encoding the console uses for output or reading batch files.


URL encoding won't help you either. cmd is hardly a web browser and outside of HTTP and the web URL encoding isn't exactly widespread (hence the name). cmd uses percent signs for environment variables and arguments to batch files and subroutines.

"Ampersand escape" also known as character entities known from HTML and XML, won't work either, because cmd is also not HTML or XML. The ampersand is used to execute multiple commands in a single line.

Community
  • 1
  • 1
Joey
  • 344,408
  • 85
  • 689
  • 683
  • my example of ampersand escape did not make it thru mark down - i meant "ampersand hash nnnnn semicolon", which probably has a better name than ampersand escape. and yes it was wishful thinking that either this or URL encoding would work in a batch file. – Richard Collins Sep 29 '09 at 04:33
  • No, it doesn't :-) Most escaping technologies don't go beyond what they were designed for and each technology has different methods. `cmd` uses the circumflex accent (`^`) as escape character but doesn't provide any way of inserting arbitrary characters. It can deal with Unicode fine, but usually not from inside batch files themselves. – Joey Sep 29 '09 at 06:45
  • Thanks for your input Johannes, but I could not get the above suggestions to work on the Japanese OS. – Richard Collins Oct 02 '09 at 02:48
  • Alternatively try one of the Japanese code pages instead. They don't work on my machine, though, since I have an English version of Windows and therefore little need for handling that. – Joey Oct 02 '09 at 05:15
  • Saving as UTF8 & prepending **chcp 65001** did the trick for me, thanks! – Nyaarium May 15 '16 at 23:08
1

I too suffered this frustrating problem in batch/cmd files. However, so far as I can see, no one yet has stated the reason why this problem occurs, here or in other, similar posts at StackOverflow. The nearest statement addressing this was:

“First of all: Batch files are pretty limited in their internationalization support. There is no direct way of telling cmd what codepage a batch file is in.”

Here is the basic problem. Cmd files are the Windows-2000+ successor to MS-DOS and IBM-DOS bat(ch) files. MS and IBM DOS (1984 vintage) were written in the IBM-PC character set (code page 437). There, the 8th-bit codes were assigned (or “clothed” with) characters different from those assigned to the corresponding codes of Windows, ANSI, or Unicode. The presumption of CP437 encoding is unalterable (except, as previously noted, through cmd.exe /u). Where the characters of the IBM-PC set have exact counterparts in the Unicode set, Windows Explorer remaps them to the Unicode counterparts. Alas, even Windows-1252 characters like š and ¾ have no counterpart in code page 437.

Here is another way to see the problem. Try opening your batch/cmd script using the Windows Edit.com program (at C:\Windows\system32\Edit.com). The Windows-1252 character 0145 ‘ (Unicode 8217) instead appears as IBM-PC 145 æ. A batch command to rename Mary'sFile.txt as Mary’sFile.txt fails, as it is interpreted as MaryæsFile.txt.

This problem can be avoided in the case of copying a file named Mary’sFile.txt: cite it as Mary?sFile.txt, e.g.:

xCopy Mary?sFile.txt Mary?sLastFile.txt

You will see a similar treatment (substitution of question marks) in a DIR list of files having Unicode characters.

Obviously, this is useless unless an extant file has the Unicode characters. This solution’s range is paltry and inadequate, but please make what use of it you can.

mmmmmm
  • 32,227
  • 27
  • 88
  • 117
0

You can try to use Shift-JIS encoding.

mbinette
  • 5,094
  • 3
  • 24
  • 32