1

I want to convert hexadecimal byte to Unicode. I have hex 0x80 in Windows-1250 and I want to convert to '\u0402'. Is it possible with standard methods without switch.

Vineet Reynolds
  • 76,006
  • 17
  • 150
  • 174
Ballon
  • 6,882
  • 18
  • 49
  • 63
  • 2
    You want the character /u0402 or you want an actual string with a value of "/u0402" ? – Affe Jun 01 '11 at 07:06
  • I want to compose method which converts all hexadecimal number grater than 0x80 until the 0xFF into character. – Ballon Jun 01 '11 at 07:14
  • The question is extremely vague in the conversion technique used. 0x80 happens to be a control character in both extended ASCII and Unicode and cannot be mapped directly into \u0402 (a character from the Cyrillic character set) unless an explicit encoding rule is specified (which is not the case). – Vineet Reynolds Jun 01 '11 at 07:16
  • 1
    if your byte value `\0x80` shows up as `Ђ`, then you use **CP-1251**. In CP-1250 it's `€`. – Andreas Dolk Jun 01 '11 at 08:43

2 Answers2

1

\u0402 is named CYRILLIC CAPITAL LETTER DJE. I guess, you have some text in a different CP1251 character encoding, an encoding where 0x80 maps to the same cyrillic letter.

Try to identify the encoding of your current text (your bytes) and use

String s = new String(myBytes, "Cp1251");

to read the bytes into a string. After that you can convert the string to bytes again, using the correct encoding.

Further Reading

Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • I use Cp1250 and I have correct character encoding. DJE i want to send to embedded system. – Ballon Jun 01 '11 at 07:33
  • I don't know why this was originally downvoted, but if the advice here and in [a related SO thread](http://stackoverflow.com/questions/4850557/convert-string-from-codepage-1252-to-1250) were to be taken into account, 0x80 in Cp1250 can be translated into 0x0402 in UTF-8/16 (or another Unicode encoding) by using the intermediate UTF-16 encoding as suggested by Andreas. – Vineet Reynolds Jun 01 '11 at 08:06
  • @Gogoo - `0x80` in cp1250 is the euro currency char (`€`, `\u20AC`), at least according to [wikipedia](http://en.wikipedia.org/wiki/Cp1250). From your question: you *have* cp1251 – Andreas Dolk Jun 01 '11 at 08:41
0

Let's clear things out: you have some bytes in CP1250, and you know the encoding. You want to send them to some system in a string with either those characters escaped in the form \uXXXX or or as bytes in UTF-16.

First of all, following advice given by Andreas_D: new String(bytes, "CP1250") will convert your bytes into java string.

now to have an array of bytes in utf-16, use new String(bytes, "CP1250").getBytes("UTF-16);

to get this string as an ASCII string with unicode character escaped, use this example

Denis Tulskiy
  • 19,012
  • 6
  • 50
  • 68
  • I have number from 1 to 99. And I must send this number to system like 0x81..until..11A (1 = 0x81, 2 = 0x82....99 = 11A). But before I send to the system I have method which convert string parameters into hexadecimal bytes. All the parameter are encoded in cp1250, only this number. First of all i get like int 1,2,3,4,5...99 and I convert into appropriate char which my method will know to convert into hexadecimal from 0x81 upper. – Ballon Jun 01 '11 at 09:34