-1

Hello everybody and good afternoon. So I'm still new-ish to this scene but have quite the ambition for it and I've been trying to learn as much as i can. i consider myself to be adept in c++ but I've always programming DOS programs and recently I've broadened my horizons to the Windows API.... with that being said, I've noticed that Windows API is greatly intertwined with UNI-CODE while DOS used ANSI.. so i know that ANSI uses 8-bit character codes and UNI-CODE uses 16-bit.. so my questions are:

1) why is this important.. is it more specific or able to hold more information since its 16 bits verses 8? i mean i know that there are some characters that ANSI does not support that UNI-CODE does but is that it??

2) What's the difference between TCHAR and WCHAR and is it just the 16 bit version of char? if WCHAR is wide char then whats TCHAR?

3)I understand that LPWSTR is long pointer to wide string but when would you use this and why? is it just a windows thing? and isn't a long pointer automatically 16 bits? Does that mean a regular pointer is 8 bits? if so why would you need the extra bits?

4)Next why would you need wstring and would you need to use wchar and tchar with it for certain functions? i.e.

wstring myStr;
TCHAR myChar;
if (myStr.find(myChar) != string::npos) { krmormrm }

or does it matter..

char myChar;
if (myStr.find(myChar) != string::npos) { jnrnikvnr }

5) Last but not least, i had trouble displaying WCHAR and wstring or even int without a conversion.. for instance (i figured it out sort of) i did:

WCHAR myChar = '1';
int i = 2;
wstring myString;

ofstream File1("myFile.txt");

if (File1.is_open())
{
    File1 << (char)myChar; //if i didn't typecast it to char it displayed 49 instead of 1;
    File1 << (WCHAR)i; //if i didn't typecast it to WCHAR(like to char instead)it displays symbols

    WCHAR temp;
    copy(myString.begin(), myString.end(), temp);

    File1 << (char)temp;
}

ok so i had a little problem with the wstring and copy. what i did in my real program (this was just a quick rescript) was took 9 WCHAR variables... used wstringstream to load them all into its variable(wss) and then into myString(my wstring variable)... so to make sure they all loaded correctly i copied it into a WCHAR temp to send it to file1 so i could physically see what loaded into it but for some reason it loaded the variables i wanted AND extra variables i didnt want and ive gone over the code multiple times and found nothing wrong.. so i got rid of the copy function and displayed each variable individually with a for loop like:

for (int i = 0; i < 81; i++)
{
    File1 << "Box " << (WCHAR)i << ": " << (char)BoxNum[i] << "\n";
}

and i concluded everything held the correct values... just fyi i was inputting the values into a text box and retrieving the text and storing it in individual variables.. the text boxes are lined up 9 by 9... so there's 9 in a row and 9 in a column... and then i used the variables from the boxes in the first row and put it in myString so i could just use the string.find() function to check for numbers in that row instead of box by box.. my problem was displaying this wstring...... ANYWAYS lol sorry just trying to provide as much info as possible, maybe someone can solve that problem for me as well.

Chris Schmich
  • 29,128
  • 5
  • 77
  • 94
Pr Erkle
  • 29
  • 1
  • 4
  • [Maybe](http://stackoverflow.com/q/6300804/596781) [of interest](http://stackoverflow.com/q/6796157/596781). – Kerrek SB Oct 25 '14 at 22:19
  • 1
    This isn't one question but in fact many questions. Such multi-point questions are not a good fit for stackexchange because people who can not answer every single one of them are discouraged from answering because their answer would be downvoted for being incomplete. It is better to ask every single sub-question separately. – Philipp Oct 25 '14 at 22:22
  • ok my apologies. i did not consider this.. – Pr Erkle Oct 25 '14 at 22:23
  • When you are trying to create a wide string constant, you should use another syntax: L"a string" (a letter L before quotes). This doesn't answer all your questions but might be helpful. – antonpp Oct 25 '14 at 22:27
  • 1
    This all stopped being relevant 10 years ago when the last floppy disk drive died on the last maintained Windows 98 machine. The world is Unicode, so is your operating system, only WCHAR[] (aka wchar_t[]) and LPWSTR (aka wchar_t*) still matter. – Hans Passant Oct 25 '14 at 22:44

1 Answers1

1
  1. 8 bit character encoding only allows 256 different characters, minus a lot of control characters. That's enough for English, but when you want to cover other European languages, like those containing strange characters like ößé or ø, this is simply not enough. Sure, you could use different codepages which place different characters on the higher 128 codepoints of an 8bit encoding, but what if you need to mix multiple languages in the same string? And what about languages like Chinese which have far more than 256 characters? But with 16 bits per character, you can use over 60.000 codepoints which is enough to cover the whole basic multilingual plane in a single codepage.

  2. A WCHAR is always 16 bit. A TCHAR can be 8bit or 16bit, depending on whether you compile your program as an unicode program or not.

  3. The difference between long-pointers and short-pointers is mostly historical and of not much concern on modern platforms (when you really want to know, check this question). The Windows API has a really long legacy dating back to the first Windows versions, so you find a lot of obsolete cruft in there. The length of a pointer depends on the kind of program. A 32bit program has 32bit long pointers and a 64bit program has 64bit long pointers. When you compile your program for 64bit, a LPWSTR will be a 64bit pointer (to a null-terminated array of 16-bit characters).

  4. The first code will only work when TCHAR is 16bit, because in that case WCHAR and TCHAR are the same thing. When TCHAR is 8 bit, that code won't compile because the find-method requires the same type the string is made from.

  5. When you write a 16bit string to a file, it gets written to the file as a 16bit string. When you then open it with a text editor and only see garbage, that's likely because your text editor interprets it with 8bit character encoding. Switch the encoding of the text editor to the encoding with which you wrote the file (UTF-16 might work). Or convert the wstring to a string before your write it, as described, in this question. But keep in mind that this can not work well when there are characters in your strings which can not be expressed with 8bit.

Philipp
  • 67,764
  • 9
  • 118
  • 153
  • 2
    Good answers. Regarding answer two, I'm pretty sure `TCHAR` is a Windows thing. I'm not aware of other platforms using it. And the thing that selects `CHAR` vs `WCHAR` when using `TCHAR` is if `_UNICODE` or `UNICODE` are defined. – jww Oct 25 '14 at 22:45
  • thank you! So just one thing im not clear on. When i sent the *(int)i* to File1 with `File1 << (char)i;` it displayed symbols but by changing the typecast to *WCHAR* it displays correctly while being treated as a 16 bit, but with the actual *WCHAR* variable i had to type cast it to *char* which would have it treated as 8 bytes. without type casting it, it appeared as 49 instead of 1. is this because of the positioning of characters in UNI CODE and ANSI or the text editor? – Pr Erkle Oct 25 '14 at 23:05
  • 1
    @PrErkle Indeed. The character `1` has the ASCII value of 49, so when you write the integer 49 to a file (truncated from 32bit to 8bit by casting it to `char`) and then open it with a text editor, you get the character `1`. – Philipp Oct 25 '14 at 23:09