I have a string encoded in windows-1256
and is displayed as ÓíÞÑÕäí áßí ¿
.
The string should be displayed in Arabic if the operating system is configured to use the encoding.
Here is the HEX representation of the string:
My intention is to convert the text to utf8
manually (using lookup tables to see which bytes need to be altered, and which should be left as-is).
I will need to iterate through all bytes in the string to see the binary value of the byte.
The string is printed to the output stream as ÓíÞÑÕäí áßí ¿
. The string length is 13 visible characters. But when I try to iterate through the bytes, the loop is run double the size (24) iterations. Maybe it wrongly assumes at UTF8 or UTF16.
How can I access the numerical value of each byte in the string?
#include <iostream>
#include <bitset>
using std::string;
using std::cout;
using std::endl;
int main() {
string myString = "ÓíÞÑÕäí áßí ¿";
// text is written in Windows-1256 encoding
cout << "string is : " << myString << endl;
// outputs: string is : ÓíÞÑÕäí áßí ¿
cout << "length : " << myString.size() << endl;
// outputs : length : 24
for (std::size_t i = 0; i < myString.size(); ++i)
{
uint8_t b1 = (uint8_t)myString.c_str()[i];
unsigned char b2 = (unsigned char) myString.c_str()[i];
unsigned int b3 = (unsigned int) myString.c_str()[i];
int b4 = (int) myString.c_str()[i];
cout << i << " - "
<< std::bitset<8>(myString.c_str()[i])
<< " : " << b1 // prints �
<< " : " << b2 // prints �
<< " : " << b3 // prints very large numbers, except for spaces (32)
<< " : " << b4 // negative values, except for the space (32)
<< endl;
}
return 0;
}
output
string is : ÓíÞÑÕäí áßí ¿
length : 24
0 - 11000011 : � : � : 4294967235 : -61
1 - 10010011 : � : � : 4294967187 : -109
2 - 11000011 : � : � : 4294967235 : -61
3 - 10101101 : � : � : 4294967213 : -83
4 - 11000011 : � : � : 4294967235 : -61
5 - 10011110 : � : � : 4294967198 : -98
6 - 11000011 : � : � : 4294967235 : -61
7 - 10010001 : � : � : 4294967185 : -111
8 - 11000011 : � : � : 4294967235 : -61
9 - 10010101 : � : � : 4294967189 : -107
10 - 11000011 : � : � : 4294967235 : -61
11 - 10100100 : � : � : 4294967204 : -92
12 - 11000011 : � : � : 4294967235 : -61
13 - 10101101 : � : � : 4294967213 : -83
14 - 00100000 : : : 32 : 32
15 - 11000011 : � : � : 4294967235 : -61
16 - 10100001 : � : � : 4294967201 : -95
17 - 11000011 : � : � : 4294967235 : -61
18 - 10011111 : � : � : 4294967199 : -97
19 - 11000011 : � : � : 4294967235 : -61
20 - 10101101 : � : � : 4294967213 : -83
21 - 00100000 : : : 32 : 32
22 - 11000010 : � : � : 4294967234 : -62
23 - 10111111 : � : � : 4294967231 : -65