4

We have a Japanese client that has source code in COBOL on an mainframe. He claims the code on the mainframe is represented in Shift-JIS2 (and we think we understand that pretty well). When that code is transferred to an PC, what is the most common encoding used? We've sent him a program to process that COBOL code and it seems to choke. The customer won't give us the code directly, so experiments are hard. His experiments seem to indicate UTF-8; I assume the Japanese characters encodable in Shift-JIS2 are correspondingly converted to Unicode equivalents. Anybody have any experience here?

EDIT: I think we solved our mystery. The client is (duh!) using CP-932 ("ShiftJIS") on the PC, but his COBOL program has Japanese characters in the identifiers, and that's why our tool is choking.

EDIT: Followup: A bit more of a surprise. SHIFT-JIS often encodes what we think of as ASCII text as so-called "FULLWIDTH" characters, that take the same screen space as an East Asian ideograph; conventionalo ASCII characters act as half-width. So, there's a FULLWIDTH "A" , "B", ... "Z" as well as FULLWIDTH "-". Apparantly, to process Japanese COBOL, our COBOL parser has to accept not only Western ASCII, but also the FULLWIDTH equivalents, esp. the FULLWIDTH letters and surprisingly a FULLWIDTH HYPHEN used to seperate "letters" in a COBOL identifier.

EDIT: IBM Enterprise COBOL allows DBCS characters in identifiers. Yikes!

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Some ftp tools, like FFFTP will preform encoding conversion for you, so make sure you specify the transfer method. http://www.forest.impress.co.jp/lib/inet/servernt/ftp/ffftp.html – monkut Aug 21 '09 at 04:16
  • And if it did, what would its default be? – Ira Baxter Aug 21 '09 at 04:28
  • Note CP-932 is an extension to Shift-JIS (often used on Windows). Don't use Shift-JIS when mean CP-932 because some characters will not be encoded correctly. – Gavin Brock Aug 11 '10 at 01:36
  • This is nearly seven years later. Still an issue? Without access to the source, you're shooting in the dark. Contracting a Japanese COBOL programmer for a short time may be like wearing night-vision goggles, because if the client is doing things in a "normal" way for Japan (and the COBOL guy knows the normal way), you get lucky. Even the verbs don't have to be original, although I'm not sure if they can be DBCS. I suspect with some tweaking they could, but I've not tried. – Bill Woodger Mar 16 '16 at 19:23
  • IBM and Microsoft put effort into providing "Windows DBCS" which is equivalent to that of IBM. Each of the relevant "IBM" codepages has an equivalent for Windows. As far as I am aware, it is "normal" for Japanese COBOL programs to use the verbs in English and all the identifiers and literals, where possible, in Japanese. It does seem to include "wide" Latin characters. – Bill Woodger Mar 17 '16 at 08:10

1 Answers1

2

There's three encodings that are all still very much in use in Japan: EUC-JP, ISO-2022-JP, and Shift-JIS.

ISO-2022-JP is usually used for E-mails. While you'll see EUC-JP in Unix machines. I personally haven't dealt with anything other than Shift-JIS though. (Nor mainframes.)

wm_eddie
  • 3,938
  • 22
  • 22