I am encountering issues in reporting in displaying names. My application uses different technologies PHP, Perl and for BI Pentaho.
We are using MYSQL as DB and my table is of CHARSET=utf8
.
My table is been stored with values in rows as below which is wrong
Row1 = Ãx—350
Row2 = Ñz–401
PHP and Perl are using different in built functions to convert the above values which is stored in DB and it is displaying in UI as below which is correct
Expected Row1 = Áx—350
Expected Row2 = Ñz–401
Coming to reports which is using pentaho I am using ETL to transform the data before showing data in reports. In order to convert the above DB stored values I am trying to convert the data through Java step as below
new java.lang.String(new java.lang.String(CODE).getBytes("Windows-1252"), "UTF-8")
But it is not converting the values properly, among the above 2 wrong values only Row2 value is been converted properly but the first Row1 is wrongly converting as below
Converted Row1 = �?x—350
Converted Row2 = Ñz–401
Please suggest what way I can convert the values properly so that for example Row1 value should be converted properly to Áx—350.
I wrote a small Java program as below to convert the Ãx—350 string to Áx—350
String input = "Ãx—350";
byte[] b1 = input.getBytes("Windows-1252");
System.out.println("Input Get Bytes = "+b1.toString());
String szUT8 = new String(b1, "UTF-8");
System.out.println("Input Encoded = " + szUT8);
The output from the above code is as below
Input Get Bytes = [B@157ee3e5
Input Encoded = �?x—350-350—É1
If we see the output the string is wrong where the actual expected output is Áx—350.
To confirm on the encoding/decoding schemes i tried testing string online and tested with string Ãx—350 and output is as expected Áx—350 which is correct.
So from this any one please point why java code is not able to convert properly although i am using the proper encoding/decoding schemes, anything else which iam missing or my approach is wrong.