0

I need to convert unsigned hex values to corresponding unicode characters which should be written to file using c++

so far I have tried this

unsigned short array[2]={0x20ac,0x20ab};

this should be converted to corresponding character in a file using c++

awesoon
  • 32,469
  • 11
  • 74
  • 99
Venkatesan
  • 422
  • 3
  • 6
  • 19
  • How is your file written? And What encoding are you using? UTF8, or...? – Roddy Dec 05 '13 at 12:06
  • Those aren't "hex values", they're hexadecimal representations of the integers 8364 and 8363, respectively. They are the Unicode representations of "€" and "₫", respectively, so you wouldn't need much conversion if those characters are what you're looking for. Or do you want to convert the string "0x20ac" to Unicode? – molbdnilo Dec 05 '13 at 12:14

2 Answers2

0

It depends on what encoding you have choosen.

If you are using UTF-8 encoding, you need to first convert each Unicode character to corresponding UTF-8 bytes sequence and then write that byte sequence to the file.

Its pseudo code will be like

 EncodeCharToUTF8(charin, charout, &numbytes); //EncodeCharToUTF8(short,char*, int*);
 WriteToFile(charout, numchar);

If you are using UTF-16 encoding, you need to first write BOM at the beginning of the file and then encoding each character into UTF-16 byte sequence (byte order matters here whether it is little-endian or big-endian depending on your BOM).

 WriteToFile("\xFF\xFE", 2); //Write BOM
 EncodeCharToUTF16(charin, charout, &numbytes); //EncodeCharToUTF16(short,char*, int*);
 //Write the character.
 WriteToFile(charout, numchar);

UTF-32 is not recommended although, step is similar to UTF-16.

I think this should help you to start.

From your array, it seems that you are going to use UTF-16. Write UTF-16 BOM 0xFFFE for little endian and 0xFEFF for big endian. After that write each character as per byte order of your machine.

I have given here pseudo code which you can white-boxed. Search more on encoding conversion.

doptimusprime
  • 9,115
  • 6
  • 52
  • 90
0

Actually you are facing two problems:

1. How to convert buffer from UTF-8 encoding to UTF-16 encoding?
I suggest you use boost locale library , sample codes can be like this:

    std::string ansi = "This is what we want to convert";
try
{           
    std::string utf8 = boost::locale::conv::to_utf<char>(ansi, "ISO-8859-1");
    std::wstring utf16 = boost::locale::conv::to_utf<wchar_t>(ansi, "ISO-8859-1");
    std::wstring utf16_2 = boost::locale::conv::utf_to_utf<wchar_t, char>(utf8);
}
catch (boost::locale::conv::conversion_error e)
{
    std::cout << "Fail to convert to unicode!" << std::endl;
}

2. How to save buffer to a file as UTF-16 encoding?
This involves writting a BOM (ByteOrderMark) at the beginning of the file manually, you can find reference here

That means if you want to save a buffer encodes as UTF-8 to a UNICODE file, you should first write 3 bytes "EF BB BF" in the beginning of the output file.

"FE FF" for Big-Endian UTF-16,
"FF FE" for Little-Endian UTF-16.

I you still don't understand how BOM works, just open a Notepad, and write some words, save it with different "Encoding" options, and then open the saved file with a hex editor, you can see the BOM.

Hope it helps you!

Daniel King
  • 407
  • 4
  • 11