How to convert mainframe binary file to readable form

Question

We are receiving EBCDIC mainframe file over XCOM in binary format. Currently, there's a legacy C-based application which is converting it to readable ASCII format. This is how the file looks like now:

As the part of migration, we have to migrate the legacy application on Java. Can you please suggest or share some link how to convert that binary file to readable format in Java?

What is XCOM? Wikipedia says it's a video game. I don't think that's what you're referring to. — k314159, Mar 29 '22 at 08:18
Also "binary format" is a very generic term. What do you mean by it? If you mean it's in EBCDIC format, that's a text format, not binary. — k314159, Mar 29 '22 at 08:20
Ok let me check. I am not much into EBCDIC format and source application team told that it's Binary file. So according to you my problem statement should be conversion from EBCDIC to Ascii. @k314159 — Manish, Mar 29 '22 at 10:25
Yes, for example the "@" you can see in your screenshot is ASCII 40 hex. In EBCDIC, the code 40 hex is the space character. Every character has a different code in EBCDIC than it does in ASCII or Unicode, that's why it looks garbled. However, it might be binary if the creators of the data say it is. In that case, they'll have to tell you exactly what is in that data. There's no single binary format. It's whatever the creator of the data makes it. — k314159, Mar 29 '22 at 10:33
You need to find out the actual format, is it EBCDIC (if so which EBCDIC), EBCDIC + binary fields or just binary fields. Presumable there is some schema/field mapping for the file. Is there a Cobol Copybook for the file (if so JRecord could be useful — Bruce Martin, Mar 29 '22 at 11:28
"Binary format" is a meaningless term. It also doesn't matter how that data is transferred. The only format is the actual specific format/file type your data has. That is where all your research has to begin with. Seriously. Don't even think about EBCDIC or ASCII or anything, as long as you do not UNDERSTAND what exactly you want to read. — GhostCat, Mar 29 '22 at 11:58
You know, it is like: the side **producing** that data probably knows what it is doing. The C application knows what it is doing. The people here on the internet have 0 knowledge of any of that. — GhostCat, Mar 29 '22 at 11:59
[This answer](https://stackoverflow.com/questions/71514337/how-can-i-convert-a-packed-decimal-format-s370fpd5-in-r/71516980#71516980) might be of help in further endeavors such as this. There is a recurring theme of not getting the mainframe people involved in these sort of questions, and it makes the task significantly more difficult. — cschneid, Mar 29 '22 at 13:46
@cschneid I even put the mainframe tag in the post so that mainframe people get involved. — Manish, Mar 29 '22 at 13:51
What I meant @Manish, was to get the mainframe people _who created the file in the first place_ involved. — cschneid, Mar 29 '22 at 15:01
@cschneid ok, actually they couldn't help much. They only told that It's transported in Binary code of XCOM. That's it. They didn't had much suggestions on decoding part.That's why I posted it here. — Manish, Mar 29 '22 at 15:52
iconv is a generalized cli to convert from one code page to another. — Hogstrom, Mar 31 '22 at 15:46

Joop Eggen · Accepted Answer · 2022-03-29T12:49:21.077

5

EBCDIC - like ASCII or Latin-1 - is text. You can try one of "Cp037", "Cp500", "Cp1047". As there are more than one EBCDIC variant check Wikipedia or such. Unfortunately not every Charset is provided by the Java SE. See Convert String from ASCII to EBCDIC in Java?

Since java 11 you can use Files.readString/writeString, otherwise one needs to use Files.readAllBytes.

Path ebcdicPath = Paths.get("...");
Path utf8Path = ebcdicPath.resolveSibling("utf8.txt");
Charset ebcdic = Charset.forName("Cp1047");
String content = Files.readString(ebcdicPath, ebcdic);
Files.writeString(utf8Path, content, StandardCharsets.UTF_8);

You might get problems with the line endings, as in Unicode the EBCDIC originating NEL (U+0085) is a legal newline/carriage return. Using Files.lines would string line endings.

Code for a hex dump of some bytes:

Path path = Paths.get("...");
byte[] content = Files.readAllBytes(path);
for (int i = 0; i < 16; ++i) {
    System.out.printf(" %02x", content[i] & 0xFF);
}
System.out.println();

    byte[] c = {(byte)0xf0, (byte)0xf0, (byte)0xf0, (byte)0xf0, (byte)0xf0, (byte)0xf9, (byte)0xf7, (byte)0xf7,
            (byte)0xf1, (byte)0xf2, (byte)0xf2, (byte)0xf0, (byte)0xf3, (byte)0xf2, (byte)0xf1, (byte)0xf0};
    Charset ebcdic = Charset.forName("Cp1047");
    System.out.println(new String(c, ebcdic));

0000097712203210

edited Mar 29 '22 at 12:49

answered Mar 29 '22 at 08:00

Joop Eggen

107,315
7
83
138

Hi @Joop Eggen we are not receiving it as EBCDIC text format. When file is sent over XCOM it's converted in binary format – Manish Mar 29 '22 at 08:17
Could you give a hex dump of the first 16 bytes or such? Added dumping code – Joop Eggen Mar 29 '22 at 11:48
I tried running it and got this one " f0 f0 f0 f0 f0 f9 f7 f7 f1 f2 f2 f0 f3 f2 f1 f0" – Manish Mar 29 '22 at 12:38
1

That is text, encoded in EBCDIC, `"0000097712203210"` – Joop Eggen Mar 29 '22 at 12:47
Thanks Joop. I tried generating the utf file using the snipped that you shared. Let me check if generated file is as expected. – Manish Mar 29 '22 at 13:04
File is generating but file is displaying differently in vi(showing garbage characters) and textedit. Also I tried ```Files.line``` it's giving only 1 line while ```wc``` is giving 0 lines. – Manish Mar 29 '22 at 13:33
`wc` counts the number of newlines in the file. If there is only 1 line that is not terminated by a newline, it considers the file as having 0 lines. – k314159 Mar 29 '22 at 13:43
@k314159 I understand that about ```wc``` but converted file is expected to have multiple lines. I think Joop already pointed in his answer that there might be problem with line ending. So just trying to understand the way to sort that out. – Manish Mar 29 '22 at 13:48
1

Sorry for my late visit back. You could do a hex dump of the UTF-8 text to see whether a NEL U+0085 appears, as UTF-8 multibyte sequence. Or do `s = s.replace("\u0085", "\r\n");` – Joop Eggen Mar 29 '22 at 15:00
1

Most likely it is either a fixed-record-length file or a variable-record-length file with a record-length prefix before every record. Linebreaks on a mainframe dataset should be a rare occurence - I for one have never seen any. – piet.t Mar 31 '22 at 05:36
@piet.t then the old java application should already cope with the file. An other general remark: the cause for all this misunderstanding is the mention of _"binary"_ transport, which here means the same as _binary_ in (S)FTP transport: transmitting the file as-is. As opposed to _textual_ transport where line endings are converted (Linux `\n`, Windows `\r\n`) – Joop Eggen Mar 31 '22 at 06:44

How to convert mainframe binary file to readable form

1 Answers1

Code for a hex dump of some bytes: