0

I have the following example of code:

#include <iostream>

int main() {

    char test[] = "êêê"; // test[10] won't compile. Use test[13] or just test[]

    std::cout << "The string " << test << " consists of " << sizeof test << " bytes: ";
    for (std::size_t n = 0; n < sizeof test; ++n)
        std::cout << std::hex << +(unsigned char)test[n] << ' ';

    std::cout << '\n';
}

which on a c++ online shell like cpp.sh gives me the following result:

The string êêê consists of 7 bytes: c3 aa c3 aa c3 aa 0

and when I try it on VS2013 it gives me the result:

The string êêê consists of 7 bytes: a8 ba a8 ba a8 ba 0

Could someone explain to me why this is, and how to fix it? I've tried to change the character set, but it still gave the same thing.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • C/C++ are sort of arms-length about character encodings. What happens with `char test[] = "\u00e9\u00e9\u00e9";`? – Eljay Dec 24 '17 at 23:20
  • VS2013 may, or may not, use utf-8 for the source code. When you use a web browser, it might translate the characters before sending them. – Bo Persson Dec 25 '17 at 07:21
  • @Eljay when i try this, the result is: a8 a6 a8 a6 a8 a6 0, Bo Persson i've try to change to UTF-8 in Project property -> c/c++ -> command line = /utf-8 and it's still the samething :( –  Dec 25 '17 at 13:33
  • @RenanMoura • using the code in your question, and using my C++ compiler, I'm seeing `The string êêê consists of 7 bytes: c3 aa c3 aa c3 aa 0`. I've saved my test.cpp source file in UTF-8 format, without BOM. – Eljay Dec 25 '17 at 14:28
  • @RenanMoura • in VS2013, make sure the source file is being saved with "Unicode (UTF-8 without signature)" format. – Eljay Dec 25 '17 at 14:34
  • In my previous comment, I should have used `\u00EA`. I can't edit it now, past edit expiry. This is what happens when you try to do Unicode from memory. – Eljay Dec 25 '17 at 14:37
  • You, your source code editor, your source code file and your compiler have to agree on what the [/source-charset](https://msdn.microsoft.com/en-us/library/mt708819.aspx) _is_. Then you can tell the compiler what [/execution-charset](https://msdn.microsoft.com/en-us/library/mt708818.aspx) you _want_. – Tom Blodget Dec 27 '17 at 02:44
  • @Eljay thank you all for the answers, but now i'm getting the following result: The string ├¬├¬├¬ consists of 7 bytes: c3 aa c3 aa c3 aa 0 anyone know why i'm getting "├¬├¬├¬" ? i just saved the source file with Unicode (UTF-8 without encoding) –  Dec 28 '17 at 19:32
  • At a guess, your output is going to something that does not understand UTF-8 encoding. – Eljay Dec 29 '17 at 00:45
  • @Eljay i'm using windows console, do you know how can i fix it? –  Dec 29 '17 at 13:19
  • @RenanMoura • I assume you are using the latest version of Windows 10. Here's an answer on using Unicode with the console: https://stackoverflow.com/questions/388490/how-to-use-unicode-characters-in-windows-command-line – Eljay Dec 29 '17 at 14:30
  • i'm using windows 7 but i'll take a look at this –  Dec 29 '17 at 14:58
  • i've solved my problems thank you all for the answers, now i have a doubt, how can i use my code to convert the same values like 'c3 aa' to its character again? like 'ê', i've tried using istringstream >> hex >> c; and still can't do it –  Dec 29 '17 at 21:30
  • @RenanMoura you should use a Unicode library, like ICONV or ICU. Or if you are using C++11 or later, look at `std::wstring_convert`. – Remy Lebeau Jan 04 '18 at 02:55

0 Answers0