Please I need to convert text file from utf8 to cp1251. And I cant use any third party software. Is there any routine written in COBOL for that? It's Micro Focus Cobol on Windows.
-
It's a simple read/write program. Then look to the various documents on the Micro Focus website for how to do Unicode to a Code Page. – Bill Woodger Sep 26 '16 at 16:36
-
"I can't use any third party software?" Yet you are proposing to write yet-another-tool which looks to me an awful lot like third party software. – Ira Baxter Sep 26 '16 at 19:39
-
Although it may count as third-party, open the file in a text editor, like TextPad or NotePad++ or Crimson Editor, or..., and save to a different encoding. If you need to tell management "it was written in COBOL", then `CALL "SYSTEM" USING "a-batchfile-invoking-a-scriptable-editor-with-this-filename"` – Brian Tiffin Sep 27 '16 at 03:47
2 Answers
Answer: there are lots of COBOL routine written for that...
I don't know any free (=open source with the freedom to actually use it) implementation but you can easily write it on your own. Just go thru the source and move it to the target, if the sign is not available in cp1251 use a '?' or whatever. The only work here: you need to lookup the 128 characters from x'80' and above...
Or you check if MF has some specific extensions or you write it on your own. There is no "please code this for me" at SO, so you should show what you've tried already.
To get you an idea have a look at the conversion of this javascript sample, should be something like (untested code):
77 utf-8-field PIC X(5000).
77 new-char PIC X.
77 cp1251-field PIC X(5000).
77 utf-8-pos PIC 9(04) COMP-5.
77 cp1251-pos PIC 9(04) COMP-5.
77 utf-8-end PIC 9(04) COMP-5.
MOVE FUNCTION LENGTH ( FUNCTION TRIM (utf-8-field TRAILING) )
TO utf-8-end
MOVE 1 TO cp1251-pos
PERFORM VARYING utf-8-pos FROM 1 BY 1
UNTIL utf-8-pos = utf-8-end
EVALUATE TRUE
*> normal ASCII character
WHEN utf-8-field (utf-8-pos) < x'80'
MOVE utf-8-field (utf-8-pos) TO new-char
*> UTF-8 in CP1251 range
WHEN utf-8-field (utf-8-pos) < x'04'
*> skip the first byte
ADD 1 TO utf-8-pos
EVALUATE TRUE
WHEN utf-8-pos > utf-8-end
MOVE '?' TO new-char
WHEN utf-8-field (utf-8-pos) = x'51'
MOVE x'B8' TO new-char
WHEN utf-8-field (utf-8-pos) >= x'4F'
MOVE '?' TO new-char
*> alternative: use alphabet conversion here
WHEN utf-8-field (utf-8-pos) = x'01'
MOVE x'A8' TO new-char
WHEN OTHER
MOVE utf-8-field (utf-8-pos) TO new-char
INSPECT new-char CONVERTING x'0203 ...
TO x'B2B2 ...
END-EVALUATE
*> UTF-8 with no CP1251 char
*> Todo: check for other multibyte headers and add the correct
*> number of characters to utf-8-pos
*> WHEN ...
WHEN OTHER
MOVE '?' TO new-char
END-EVALUATE
STRING new-char
DELIMITED BY SIZE
INTO cp1251-field
WITH POINTER cp1251-pos
END-STRING
END-PERFORM
You may want to define an ALPHABET
for the CONVERTING x'0203 ... TO x'B2B3 ...
part:
SPECIAL-NAMES.
ALPHABET UTF8-PART-2 IS x'01', x'02' THRU x'4F', x'51'.
ALPHABET CP1251 IS x'A8', x'B2' THRU x'FF', x'B8'.
and in the inner EVALUATE
use
MOVE utf-8-field (utf-8-pos) TO new-char
INSPECT new-char CONVERTING UTF8-PART-2 TO CP1251

- 1
- 1

- 6,263
- 1
- 18
- 38