0

I just got completely bamboozled. Ive been searching for hours why i cant convert a string to a PUCHAR (unsigned char*). Its weird, but for some reason the windows encryption methods only accept PUCHAR's... (why?)

I found plenty of solutions but at first they dint seem to work. The first 128 characters from the ASCII table worked fine, but other characers like 'ù' and 'µ' were converted to other ones (mostly weird ASCII symbols, but always the same symbol for the corresponding given character).

I now just found out that the cast DOES work, but only for strings that are read from the console using cin?! Hardcoded strings do not work?! I honestly dont have a single clue about the cause of this behaviour.

Here is an example:

With CIN

cout << "With cin: ";
string password;
cin >> password;
unsigned char q = (unsigned char)password[0];
PUCHAR pbPassword = new unsigned char[1];      
pbPassword[0] = q;
pbPassword[1] = NULL;                       //Null or garbage is printed
cout << pbPassword;

This outputs:

With cin:

µ
µ

Without CIN

cout << "Without cin: ";
string password = "µ";
unsigned char q = (unsigned char)password[0];
PUCHAR pbPassword = new unsigned char[1];
pbPassword[0] = q;
pbPassword[1] = NULL;
cout << pbPassword;

This outputs:

Without cin: ╡

I'm a beginning programmer so sorry if the code is messy.

Although i use the same character, the cast for the hardcoded string does not work. Even when using the exact same cast.

What i also noticed is that i can put a character at index 1, while the array only has a length of 1, meaning that i am accessing memory i actually shouldn't. How is this possible? Usually this gives a memory access error of some sorts right?

EDIT: The main question is not how to cast, or why i can still put elements in the array even if it has length 1. Its why cout gives different results for a cast from a string read from cin and a hardcoded string.

DerDerrr
  • 56
  • 5
  • 1
    This has probably to do with text encoding / code pages. `╡`is B5 in CodePage 850, and `µ` in CodePage 1252 (~ Latin-1). Check https://stackoverflow.com/questions/10611455/what-is-character-encoding-and-why-should-i-bother-with-it for pointers – king_nak May 15 '18 at 13:32
  • I was also thinking it has something to do with text encoding. The odd thing is that strings initialized with cin do work, but hardcoded ones dont. I guess they must be stored differently or something? – DerDerrr May 15 '18 at 13:43
  • yes, that's exactly what happens. Your source file is probably stored in ISO-8859-1, so the `µ` is stored as 0xB5. When you print it on the console in CP 850, it displays `╡`. However, when you input `µ` through the console, it will return `µ` in CP 850, which is 0xE6... And when you print a CP850 characeter using CP850, you get what you got... – king_nak May 15 '18 at 14:01
  • This code `PUCHAR pbPassword = new unsigned char[1]; pbPassword[0] = q; pbPassword[1] = NULL;` gives undefined behavior, since it dynamically allocates an array of one character, and writes values to two characters in that array. Both your code samples do this. – Peter May 15 '18 at 14:54
  • In the Windows world, the only robust approach is Unicode from top to bottom, with UTF-16LE in memory and either UTF-8 or UTF-16 on disk. Legacy encodings are a mess. Use wide-character strings and string literals such as `L"µ"`. If `stdout` is a tty (e.g. `isatty(_fileno(stdout))`), modify the file descriptor to use UTF-16 text mode (e.g. `_setmode(_fileno(stdout), _O_U16TEXT)`). Ditto for `stdin`. Then `wcin` and `wcout` (again, [w]ide-character) will use the console's Unicode API. – Eryk Sun May 16 '18 at 01:07

3 Answers3

0

With new unsigned char[1] you allocate one unsigned char. Then you do pbPassword[1] = NULL which will index out of bounds and lead to undefined behavior.

The number in the allocation is not the top index, it's the number of elements, just like when declaring an array. So it should be new unsigned char[2].

And even if you need to pass a pointer to unsigned char somewhere, I recommend you still use std::string. Which means you should have

std::string pbPassword(1, password[0]);

That creates a string with one character, and it is initialized to password[0]. If you then need PUCHAR from that, you can just cast it:

reinterpret_cast<const PUCHAR>(pbPassword.c_str())
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 1
    Also, `NULL` is at least morally a pointer, and might not be a number. – Deduplicator May 15 '18 at 13:23
  • Yes, i know that the array should have length 2, that was also just for demonstration purposes. Also the method takes a PUCHAR, not a PUCHAR*, not that that makes a big difference. Finally, when using that cast, cout will still not print the right character. Thats actually what this question is about, why do these casts do not convert the hardcoded string propely? When using [this code (link)](https://pastebin.com/sAGTVaWe), cout still prints ╡. – DerDerrr May 15 '18 at 13:25
0

Your string literals are probably encoded in whatever code page your source files are saved in. When you print these strings out they are displayed in the code page of your console.

The strings read from the console will be in to code page of the console so will print correctly when sent back to the console.

On windows if you want to read or write non-ascii characters to the console you should be using std::wcin and std::wcout with std::wstring to avoid this issue. You can then convert the std::wstring to utf-8 or one of the 1-byte code pages if you need to.

Alan Birtles
  • 32,622
  • 4
  • 31
  • 60
0

For the "Without cin" case it looks like encoding mismatch. First you read 1 byte 0xB5(181) (what is the password.size() ?) and then print it to console with default code page 437, where 181 is the code for ╡.

Spock77
  • 3,256
  • 2
  • 30
  • 39