239

If I run my C++ application with the following main() method everything is OK:

int main(int argc, char *argv[]) 
{
   cout << "There are " << argc << " arguments:" << endl;

   // Loop through each argument and print its number and value
   for (int i=0; i<argc; i++)
      cout << i << " " << argv[i] << endl;

   return 0;
}

I get what I expect and my arguments are printed out.

However, if I use _tmain:

int _tmain(int argc, char *argv[]) 
{
   cout << "There are " << argc << " arguments:" << endl;

   // Loop through each argument and print its number and value
   for (int i=0; i<argc; i++)
      cout << i << " " << argv[i] << endl;

   return 0;
}

It just displays the first character of each argument.

What is the difference causing this?

joshcomley
  • 28,099
  • 24
  • 107
  • 147

5 Answers5

376

_tmain does not exist in C++. main does.

_tmain is a Microsoft extension.

main is, according to the C++ standard, the program's entry point. It has one of these two signatures:

int main();
int main(int argc, char* argv[]);

Microsoft has added a wmain which replaces the second signature with this:

int wmain(int argc, wchar_t* argv[]);

And then, to make it easier to switch between Unicode (UTF-16) and their multibyte character set, they've defined _tmain which, if Unicode is enabled, is compiled as wmain, and otherwise as main.

As for the second part of your question, the first part of the puzzle is that your main function is wrong. wmain should take a wchar_t argument, not char. Since the compiler doesn't enforce this for the main function, you get a program where an array of wchar_t strings are passed to the main function, which interprets them as char strings.

Now, in UTF-16, the character set used by Windows when Unicode is enabled, all the ASCII characters are represented as the pair of bytes \0 followed by the ASCII value.

And since the x86 CPU is little-endian, the order of these bytes are swapped, so that the ASCII value comes first, then followed by a null byte.

And in a char string, how is the string usually terminated? Yep, by a null byte. So your program sees a bunch of strings, each one byte long.

In general, you have three options when doing Windows programming:

  • Explicitly use Unicode (call wmain, and for every Windows API function which takes char-related arguments, call the -W version of the function. Instead of CreateWindow, call CreateWindowW). And instead of using char use wchar_t, and so on
  • Explicitly disable Unicode. Call main, and CreateWindowA, and use char for strings.
  • Allow both. (call _tmain, and CreateWindow, which resolve to main/_tmain and CreateWindowA/CreateWindowW), and use TCHAR instead of char/wchar_t.

The same applies to the string types defined by windows.h: LPCTSTR resolves to either LPCSTR or LPCWSTR, and for every other type that includes char or wchar_t, a -T- version always exists which can be used instead.

Note that all of this is Microsoft specific. TCHAR is not a standard C++ type, it is a macro defined in windows.h. wmain and _tmain are also defined by Microsoft only.

Samuel Katz
  • 24,066
  • 8
  • 71
  • 57
jalf
  • 243,077
  • 51
  • 345
  • 550
  • 6
    i wonder whether they provide a tcout too? so that one could just do tcout << argv[n]; and it resolves to cout in Ansi and wcout in Unicode mode? I suspect that could be useful for him in this situation. and +1 of course, nice answer :) – Johannes Schaub - litb May 22 '09 at 01:57
  • 1
    What disadvantage would disabling UNICODE provide? – joshcomley May 22 '09 at 10:03
  • @Johannes Schaub - litb : AFAIK, they don't provide a tcout (even if they provide cout and wcout). On Visual C++2003, I had to define one, as well as define and/or typedef all other STL-related symbols I wanted to use a TCHAR. I don't know on Visual C++2008 or 2010, though. – paercebal May 14 '10 at 16:45
  • 2
    -1 None of the three options listed are practical. The practical way to program Windows is to define `UNICODE`. And some other adjustments for C++ etc., before including ``. Then use the Unicode functions like `CreateWindow` (in general with no `W` needed at the end). – Cheers and hth. - Alf Apr 01 '12 at 18:56
  • 12
    Why exactly do you consider that to be more practical? – jalf Apr 01 '12 at 22:58
  • 1
    *"..._tmain are also defined by Microsoft only"* __Your last paragraph is absolutely inaccurate__, _tmain is implemented exactly the same in RAD Studio's C++Builder. In fact, under C++Builder's default [_TCHAR mapping](http://docwiki.embarcadero.com/RADStudio/XE3/en/TCHAR_Mapping), simply using main will fail. – arkon Mar 26 '13 at 05:52
  • Thanks! This was a very frustrating problem for me when I started using Visual Studio. I get why Microsoft made the various changes they did, but it's horrible trying to make the conversion... Seems like everything I do that worked fine in gcc is broken in VS. :-) – Brian Knoblauch Apr 20 '13 at 13:16
  • logically, they'd just overload the cout<< operator to handle tchar/wchar correctly, no need to make a dedicated wcout – hanshenrik Feb 26 '15 at 07:28
  • 1
    @b1nary.atr0phy: jalf meant that `_tmain` is Windows specific (and the OS API on Windows is defined by Microsoft). On Linux or MacOSX or FreeBSD systems there is no such `_tmain` entry point! – Basile Starynkevitch Jul 02 '17 at 09:59
  • Shouldn't it be that `_tmain resolves to main/wmain` not `main/_tmain` – user1720897 Jun 13 '18 at 10:20
  • @b1nary.atr0phy ur whole comment is absolutely inaccurate. _tmain will only compile and execute on ms-os, no matter which ide u use, even if u manually use ml.exe and link.exe – clockw0rk Apr 04 '19 at 08:27
44

_tmain is a macro that gets redefined depending on whether or not you compile with Unicode or ASCII. It is a Microsoft extension and isn't guaranteed to work on any other compilers.

The correct declaration is

 int _tmain(int argc, _TCHAR *argv[]) 

If the macro UNICODE is defined, that expands to

int wmain(int argc, wchar_t *argv[])

Otherwise it expands to

int main(int argc, char *argv[])

Your definition goes for a bit of each, and (if you have UNICODE defined) will expand to

 int wmain(int argc, char *argv[])

which is just plain wrong.

std::cout works with ASCII characters. You need std::wcout if you are using wide characters.

try something like this

#include <iostream>
#include <tchar.h>

#if defined(UNICODE)
    #define _tcout std::wcout
#else
    #define _tcout std::cout
#endif

int _tmain(int argc, _TCHAR *argv[]) 
{
   _tcout << _T("There are ") << argc << _T(" arguments:") << std::endl;

   // Loop through each argument and print its number and value
   for (int i=0; i<argc; i++)
      _tcout << i << _T(" ") << argv[i] << std::endl;

   return 0;
}

Or you could just decide in advance whether to use wide or narrow characters. :-)

Updated 12 Nov 2013:

Changed the traditional "TCHAR" to "_TCHAR" which seems to be the latest fashion. Both work fine.

End Update

Michael J
  • 7,631
  • 2
  • 24
  • 30
  • 1
    *"It is a Microsoft extension and won't work on any other compilers."* [Not as far as RAD Studio is concerned.](http://docwiki.embarcadero.com/RADStudio/XE3/en/TCHAR_Mapping) – arkon Mar 26 '13 at 05:54
  • @b1naryatr0phy - To split hairs, the tool you link to uses "_TCHAR", rather than "TCHAR" so it isn't compatible (though it does falsify my statement). However I should have said "It is a Microsoft extension and isn't guaranteed to work on any other compilers.". I'll amend the original. – Michael J Mar 26 '13 at 11:45
  • @MichaelJ I was mainly referring to "Code Changes..." section, which explains why RAD Studio now uses _tmain in place of main, and actually it is now the standard default for Embarcadero's C++Builder. – arkon Mar 26 '13 at 14:57
  • 1
    That is the second time recently that this four-year-old answer has been downvoted. It would be nice if downvoters made a comment explaining what problems they perceive and (if possible) how to improve the answer. b1naryatr0phy found a badly written sentence, but I fixed that up in March. Any guidence would be appreciated. – Michael J Jun 10 '13 at 08:44
  • @MichaelJ, not to be offensive, but its `_TCHAR` not `TCHAR`. You're missing the `_` here..:) Please edit it..as I won't be able to edit this minor issue.. – Afzaal Ahmad Zeeshan Oct 22 '13 at 04:44
  • @Afzaal Ahmad Zeeshan -- MSVC++ has used "TCHAR" since at least at least MSC version 4 (long before MSVC++ version 1). The docs still specify "TCHAR". See http://msdn.microsoft.com/en-us/library/office/cc842072.aspx. I just noticed that VC++ 2012 (express) does generate default code with "_TCHAR" so I guess that is OK too, but TCHAR is still fine to use. – Michael J Nov 11 '13 at 15:06
  • Just realised what has happened. MS have been pushing their adherence to the C++ standard docs, so they are prefixing anything that isn't in the standard with a "_". I'd say that using either "TCHAR" or "_TCHAR" is OK. – Michael J Nov 11 '13 at 15:15
  • @MichaelJ: That's not correct. `TCHAR` and `_TCHAR` have **always** existed. `TCHAR` controls the character set used by the Windows header files. `_TCHAR` controls the character set used by the CRT. The difference is subtle, but well justified. Since the entry point is part of the CRT, using `_TCHAR` is correct, though. For reference, see [TEXT vs. _TEXT vs. _T, and UNICODE vs. _UNICODE](https://blogs.msdn.microsoft.com/oldnewthing/20040212-00/?p=40643). – IInspectable Apr 24 '17 at 17:16
  • @IInspectable - You are mistaken.Neither TCHAR nor _TCHAR have always existed. TCHAR appeared in the mid 1990s. I don't remember exactly when _TCHAR appeared but I think it was around 2000. The Raymond Chen article that you linked doesn't mention TCHAR. As I said back in 2009 when this started, older versions of Visual C++ would generate a _tmain that used TCHAR. More recent versions seem to use _TCHAR. I will be surprised if you can find a practical case where it makes a difference. – Michael J Jul 04 '17 at 13:09
  • 1
    continued ... Almost nobody manually defines UNICODE/_UNICODE these days. They click a checkbox in Visual Studio and it all happens auto-magically. – Michael J Jul 04 '17 at 13:11
  • The article I linked to explains, why there are symbols with and without a leading underscore, and when to use which. A practical case were this makes is when it comes to salary. The Unicode symbols never have been set through a checkbox in Visual Studio. This has always been a dropdown list. And with CMake support built into Visual Studio, starting with 2017, defining the preprocessor symbols has become fairly common as well. Regardless, this answer is stock full of wrong that I'm puzzled, why you even complain about the down-votes. They are well justified. – IInspectable Jul 04 '17 at 16:17
  • @IInspectable - You caught me: I said checkbox when I should have said list. If my post has an error, tell me what it is and I'll fix it. I didn't complain about downvotes, only downvotes where no reason is given. Raymond Chen is writing about two specific pairs of symbols, not all symbols in general. – Michael J Jul 04 '17 at 23:31
  • Raymond Chen specifically talks about symbols used with the generic-text mappings. The rationale behind a leading underscore applies to **any** symbol, including the `TCHAR`/`_TCHAR` pair of symbols. Errors in your proposed answer: While it is possible to compile for ASCII, this is not directly accessible through the Visual Studio IDE. Likewise, `std::cout` doesn't just support ASCII. – IInspectable Jul 05 '17 at 05:51
  • 5
    Life is too short for this. – Michael J Jul 05 '17 at 07:54
  • @b1nary.atr0phy again, nobody cares for a single IDE, who even uses RAD stuff – clockw0rk Apr 04 '19 at 08:35
11

the _T convention is used to indicate the program should use the character set defined for the application (Unicode, ASCII, MBCS, etc.). You can surround your strings with _T( ) to have them stored in the correct format.

 cout << _T( "There are " ) << argc << _T( " arguments:" ) << endl;
Paul Alexander
  • 31,970
  • 14
  • 96
  • 151
  • In fact, MS recommends this approach, afaik. Making your application unicode-aware, they call it... using the _t version of all the string manipulation functions, too. – Deep-B May 14 '10 at 15:45
  • 1
    @Deep-B : And on Windows, this **is** how you make your application unicode-ready (I prefer the term of unicode-ready to -aware), if it was based on `char` s before. If your application directly uses `wchar_t` then your application **is** unicode. – paercebal May 14 '10 at 16:49
  • 5
    By the way, if you try to compile on UNICODE, then your code won't compile as your outputing wchar_t inside a char-based cout, where it should have been wcout. See Michael J's answer for an exemple of defining a "tcout"... – paercebal May 14 '10 at 16:50
  • 1
    None if this is recommended by Microsoft, largely, because it's plain wrong. When compiling for Unicode, the code writes pointer values to the standard output stream. -1. – IInspectable Jul 04 '17 at 17:18
6

Ok, the question seems to have been answered fairly well, the UNICODE overload should take a wide character array as its second parameter. So if the command line parameter is "Hello" that would probably end up as "H\0e\0l\0l\0o\0\0\0" and your program would only print the 'H' before it sees what it thinks is a null terminator.

So now you may wonder why it even compiles and links.

Well it compiles because you are allowed to define an overload to a function.

Linking is a slightly more complex issue. In C, there is no decorated symbol information so it just finds a function called main. The argc and argv are probably always there as call-stack parameters just in case even if your function is defined with that signature, even if your function happens to ignore them.

Even though C++ does have decorated symbols, it almost certainly uses C-linkage for main, rather than a clever linker that looks for each one in turn. So it found your wmain and put the parameters onto the call-stack in case it is the int wmain(int, wchar_t*[]) version.

CashCow
  • 30,981
  • 5
  • 61
  • 92
  • Ok, so I have problem porting my code to windows widechar for years now and THAT is the first time I understood why this happens. Here, take all my reputation! haha – Leonel Nov 24 '14 at 10:43
-1

With a little effort of templatizing this, it wold work with any list of objects.

#include <iostream>
#include <string>
#include <vector>

char non_repeating_char(std::string str){
    while(str.size() >= 2){
        std::vector<size_t> rmlist; 
        for(size_t  i = 1;  i < str.size(); i++){        
            if(str[0] == str[i]) {
                rmlist.push_back(i);
            }      
        }          

        if(rmlist.size()){            
            size_t s = 0;  // Need for terator position adjustment   
            str.erase(str.begin() + 0);
            ++s;
            for (size_t j : rmlist){   
                str.erase(str.begin() + (j-s));                
                ++s;
            }
         continue;
        }
        return str[0];
   }
    if(str.size() == 1) return str[0];
    else return -1;
}

int main(int argc, char ** args)
{
    std::string test = "FabaccdbefafFG";
    test = args[1];
    char non_repeating = non_repeating_char(test);
    Std::cout << non_repeating << '\n';
}
Misgevolution
  • 825
  • 10
  • 22