tchar safe functions -- count parameter for UTF-8 constants

Question

I'm porting a library from char to TCHAR. the count parameter of this fragment, according to MSDN, is the number of multibyte characters, not the number of bytes. so, did I get this right? My project properties in VC9 say 'use unicode character set' and I think that's correct, but I'm not how that impacts my count parameter.

_tcsncmp(access, TEXT("ftp"), 3); //or do i want _tcsnccmp?

"Supported on Windows platforms only, _mbsncmp and _mbsnbcmp are multibyte versions of strncmp. _mbsncmp will compare at most count multibyte characters and _mbsnbcmp will compare at most count bytes. They both use the current multibyte code page.

_tcsnccmp and _tcsncmp are the corresponding Generic functions for _mbsncmp and _mbsnbcmp, respectively. _tccmp is equivalent to _tcsnccmp."

A similar question is _tcslen vs _tcsclen.

score 4 · Accepted Answer · answered Jun 07 '10 at 21:58

4

Yes, you get it right.

The question, however, is why do you port it to TCHAR - something that is sensitive to _UNICODE define.

Why not use UTF8 and char*?

answered Jun 07 '10 at 21:58

Pavel Radzivilovsky

18,794
5
57
67

doesn't that defeat the point? I'm porting to tchar only because that's what the surrounding code is using. do i have a choice here? – Dustin Getz Jun 07 '10 at 22:04
2

there's a serious belief that TCHAR is a misguided effort that should be abandoned. See http://stackoverflow.com/questions/1049947/should-utf-16-be-considered-harmful – Pavel Radzivilovsky Jun 07 '10 at 22:43
Im porting a parsing method, written in 1995, to operate on basic_string. i'm starting to think this could be a as parsing logic will be sensitive to multi-byte characters. i don't think passing in UTF-8 byte arrays to this function will be very pretty. – Dustin Getz Jun 08 '10 at 14:15
2

If you have a method operating on char*, it's much easier to make it UTF-8 compliant than TCHAR-compliant. And the result is better. – Pavel Radzivilovsky Jun 08 '10 at 14:22

score 2 · Answer 2 · edited May 23 '17 at 12:26

TCHAR is a type that's either 8 or 16 bits depending on whether _UNICODE is defined. But UTF-8 always uses 8-bit code units, so using TCHAR is silly. Just use char.

TCHAR is tied to the existence of two versions of the Windows API: "A" functions that use legacy 8-bit code pages, and "W" functions that use UTF-16. UTF-8 is not supported. You can use UTF-8 on Windows by explicitly converting your UTF-8 strings to UTF-16 for API calls, but you won't get any help from _UNICODE or TCHAR.

tchar safe functions -- count parameter for UTF-8 constants

2 Answers2