2

I just have the string \u0130smail and I want to convert it to İsmail and also convert

  \u0130 --> İ   
  \u00E7 --> ç

I tried

String str = "\u0130smail";
sysout(str); 

and it worked, but whenever I get the string "\u0130smail" from the DB or the internet it doesn't give the correct result.

static String deneme(String string){
    String string2 = null;

    try {
        byte[] utf8 = string.getBytes("UTF-8");
        string2 = new String(utf8, "UTF-8");
    } catch (UnsupportedEncodingException e) {
    }
    return string2;
}

didn't work either.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
brtb
  • 2,201
  • 6
  • 34
  • 53
  • 1
    Doing this: `byte[] utf8 = string.getBytes("UTF-8"); string2 = new String(utf8, "UTF-8");` does not do anything. How did you build your `string` argument? – Guillaume Polet Mar 08 '12 at 15:18
  • What do you get as output? That'd be very helpful in determining the problem. – chooban Mar 11 '12 at 17:05
  • I found this relevant http://stackoverflow.com/questions/1934842/unicode-to-string-conversion-in-java – Siamore Nov 21 '14 at 02:57

2 Answers2

3

Strings "\u0130smail" and "İsmail" are absolutely the same from the language standpoint. If you mean that you get a string "\\u0130smail" (note that I've escaped the backslash), then you will have to find the pattern of the unicode code points and convert them to normal unicode letters or just print the number, whichever you need. Regular expressions could help you in this case.

Malcolm
  • 41,014
  • 11
  • 68
  • 91
0

Converting the existing string to bytes and back again isn't going to help you. You need to look at the exact characters in the string you've got - and work out how you got them.

I suggest you print out the integer value of each character in the string as an integer (ideally in hex) to find out exactly what you've got... then trace it back as far as you can, to work out what's going wrong.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194