2

I received string from IBM Mainframe like below (2bytes graphic fonts)

" ;A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z;a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;0;1;2;3;4;5;6;7;8;9;`;-;=;₩;~;!;@;#;$;%;^;&;*;(;);_;+;|;[;];{;};:;";';,;.;/;<;>;?;";

and, I wanna change these characters to 1 byte ascii codes

How can I replace these using java.util.regex.Matcher, String.replaceAll() in Java

target characters :

;A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;R;S;T;U;V;W;X;Y;Z;a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u;v;w;x;y;z;0;1;2;3;4;5;6;7;8;9;`;-;=;\;~;!;@;#;$;%;^;&;*;(;);_;+;|;[;];{;};:;";';,;.;/;<;>;?;";

jensgram
  • 31,109
  • 6
  • 81
  • 98
JasonHong
  • 41
  • 4
  • 5
    regexes shouldn't be used for character encoding translation. See [Encoding conversion in java](http://stackoverflow.com/questions/229015/encoding-conversion-in-java). – outis Nov 17 '11 at 09:20
  • This is not a duplicate of that other question. The OP is talking about actual characters, mostly from the [Halfwidth and Fullwidth Forms block](http://www.fileformat.info/info/unicode/block/halfwidth_and_fullwidth_forms/index.htm), that need to be replaced with ASCII characters. – Alan Moore Nov 17 '11 at 14:35

2 Answers2

2

This is not (as other responders are saying) a character-encoding issue, but regexes are still the wrong tool. If Java had an equivalent of Perl's tr/// operator, that would be the right tool, but you can hand-code it easily enough:

public static String convert(String oldString)
{
  String oldChars = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789`-=₩~!@#$%^&*()_+|[]{}:"',./<>?";
  String newChars = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789`-=\\~!@#$%^&*()_+|[]{}:\"',./<>?";

  StringBuilder sb = new StringBuilder();
  int len = oldString.length();
  for (int i = 0; i < len; i++)
  {
    char ch = oldString.charAt(i);
    int pos = oldChars.indexOf(ch);
    sb.append(pos < 0 ? ch : newChars.charAt(pos));
  }
  return sb.toString();
}

I'm assuming each character in the first string corresponds to the character at the same position in the second string, and that the first character (U+3000, 'IDEOGRAPHIC SPACE') should be converted to an ASCII space (U+0020).

Be sure to save the source file as UTF-8, and include the -encoding UTF-8 option when you compile it (or tell your IDE to do so).

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0

Don't think this one's about regex, it's about encoding. Should be possible to read into a String with 2-byte and then write it with any other encoding. Look here for supported encodings.

Kai Huppmann
  • 10,705
  • 6
  • 47
  • 78