How to convert custom encoded file to UTF-8 (in Java or with a dedicated tool)

Question

A legacy software I'm rewriting in Java uses custom (similar to Win-1252) encoding as it's data storage. For the new system I'm building I'd like to replace this with UTF-8.

So I need to convert those files to UTF-8 to feed my database. I know the character map used, but it's not any of the widely known ones. Eg. "A" is on position 0x0041 (as in Win-1252), but on 0x0042 there is a sign which in UTF-8 appears on position 0x0102, and so on. Is there an easy way to decode and convert those files with Java?

I've read many posts already but they all dealt with industry standard encodings of some kind, not with custom ones. I'm expecting it's possible to create a custom java.nio.ByteBuffer.CharsetDecoder or java.nio.charset.Charset to pass it to java.io.InputStreamReader as described in the first Answer here?

Any suggestions welcome.

score 9 · Accepted Answer · answered Jan 20 '11 at 08:14

9

no need to be complicated. just make an array of 256 chars

static char[] map = { ... 'A', '\u0102', ... }

then

read each byte b in source
    int index = (0xff) & b; // to make it unsigned
    char c = map[index];
    target.write( c );

answered Jan 20 '11 at 08:14

irreputable

44,725
9
65
93

How to convert custom encoded file to UTF-8 (in Java or with a dedicated tool)

1 Answers1