1

I am trying to encode a string with special characters like 'É' using below mentioned code then it is not replicated properly...

String Cdata="MARIE-HÉLÈNE";
byte sByte[]=Cdata.getBytes(); 
Cdata= new String(sByte,"UTF-8");
System.out.println(Cdata);

expected output: MARIE-HÉLÈNE but instead output: MARIE-HE coming

Ankur Lathi
  • 7,636
  • 5
  • 37
  • 49
shilpa
  • 41
  • 1
  • 1
  • 4

2 Answers2

2

First thing is that you need to make sure that your source file is actually stored as UTF-8 - see @Ankur's answer for a good explanation.

Then, you also need to provide an encoding when calling getBytes() on String to retrieve the byte array:

byte sByte[] = Cdata.getBytes("UTF-8"); 

If you call String.getBytes() with no encoding, the platform`s default encoding is used, which can be (almost) anything. See also java.lang.String.getBytes():

Encodes this String into a sequence of bytes using the platform's default charset

With that, the following SSCCE properly prints the expected output for me (note: took identifiers from question, not adjusted to coding conventions):

import java.io.UnsupportedEncodingException;

public class Encoding {
   public static void main(String[] args) throws UnsupportedEncodingException {
      String Cdata = "MARIE-HÉLÈNE";
      byte sByte[] = Cdata.getBytes("UTF-8"); 
      Cdata = new String(sByte,"UTF-8");
      System.out.println(Cdata);
   }
}
Andreas Fester
  • 36,091
  • 7
  • 95
  • 123
  • HI Andreas, I have tried this piece of code it although produces right output but when i wrap this string and send to other webservice for replication of output bytes they replicate it incorrectly as MARIE-H?L?N ... and reported that i ma not encoding it properly.. can you please specify that that if i am encoding properly or they are decoding it incorrectly – shilpa Jul 23 '13 at 08:27
  • 1
    You might need to set also some encoding when transferring the string through the web service - it looks as if the string is transferred as ISO8859-1 or something through the web service. But we would need to know more about the web service framework you are using - I suggest that you ask a new question if you can not manage to make it work – Andreas Fester Jul 23 '13 at 08:29
2

You need to tell eclipse to use UTF-8 for its stdout console. You can set that by Window > Preferences > General > Workspace > Text File Encoding.

enter image description here

Ankur Lathi
  • 7,636
  • 5
  • 37
  • 49
  • `You need to tell eclipse to use UTF-8 for its stdout console` - the important thing is that you are setting the **source file format** to UTF-8 (which you do as you describe), so that the respective characters are properly stored in the source file – Andreas Fester Jul 23 '13 at 08:26
  • Hi Ankur I am getting this output: MARIE-H�L�NE – shilpa Jul 23 '13 at 08:37