0

I need to covert following EBCDIC to UTF-8 (if not ascii) using only Java. I don't want to use JTOpen and my solution is inspired by this answer

var infile = new File("file_path/filename.ebcdic");
try (var bufferedReader = new BufferedReader(new InputStreamReader(new FileInputStream(infile),"CP037"))) 
{
    System.out.println(bufferedReader.lines().collect(Collectors.joining("\n")));
}

This is working fine for

sample-customer-data.ebcdic

but not for

English.ebcdic

I actually want to remove formatting Bytes as well for example

I need

Shah           Priya   Berlin              Berlin           MH00002
Schulz         Tomasz  Malmˆ               Scania           MH00001
Smith          Mike    Ames                Iowa             MH00011
Sanchez        Maria   Bogot·              D.C.             MH00041
Sasthi         Gayatri Bangalore           Karnataka        MH00045

instead of

CsBhBaBh           CpBrBiByBa   CbBeBrBlBiBn              CbBeBrBlBiBn           CmChC^C^C^C^C¥CsBcBhBuBlBz         CtBoBmBaBsBz  CmBaBlBmCð               CsBcBaBnBiBa           CmChC^C^C^C^C£CsBmBiBtBh          CmBiBkBe    CaBmBeBs                CiBoBwBa             CmChC^C^C^C£C£CsBaBnBcBhBeBz        CmBaBrBiBa   CbBoBgBoBtá              Cd.Cc.             CmChC^C^C^C©C£CsBaBsBtBhBi         CgBaByBaBtBrBi CbBaBnBgBaBlBoBrBe           CkBaBrBnBaBtBaBkBa        CmChC^C^C^C©C§

Edit 1:

Code updated as per comments suggestions.

Sahib Yar
  • 1,030
  • 11
  • 29
  • Where is this data coming from? Any idea why it is encoded in such a strange way? This is a problem about this special format, not with EBCDIC itself. – piet.t Mar 21 '22 at 09:46
  • 2
    P.S.: Instead of reading bytes you could also use an `InputStreamReader` constructed with the appropriate codepage. – piet.t Mar 21 '22 at 09:48
  • I took these samples from internet, but for our customers, it is quite possible that the file could be encoded in this format or in proper ebcdic format or may be mix of both. – Sahib Yar Mar 21 '22 at 09:50
  • thank you for suggestion of `InputStreamReader`, I will update my code accordingly. – Sahib Yar Mar 21 '22 at 09:51
  • 1
    EBCDIC is a family of encodings, there are various country specific encodings. CP037 (coded page 37) is US/Canadian EBCDIC, for UK it is coded page 23 or 24. See https://en.wikipedia.org/wiki/Code_page – Bruce Martin Mar 21 '22 at 12:03
  • What you mean by "hex escaped EBCDIC"? The file English.ebcdic does not look proper. It contains X'0a25' which could be the result of an EBCDIC to ASCII translation during FTP download. Where and how did you get that file? If from the mainframe, try again using **binary** transfer. – phunsoft Mar 21 '22 at 18:02
  • @BruceMartin I know EBCDIC is a family of encodings, but CP037 serves more than 90% of our business requirements, and this code was just a sample. – Sahib Yar Mar 22 '22 at 05:41

0 Answers0