Convert a UTF-16 buffer into CString if only byte size is Known?

Question

When using sqlite3_column_text16 and sqlite3_column_bytes16 in https://www.sqlite.org/c3ref/column_blob.html , I can only get a pointer to the UTF-16 text buffer, as well as the number of bytes in the buffer.

Now I need to convert such a buffer into a CString object. How to do so? It seems that CString only has the following constructor:

CString( LPCTSTR lpch, int nLength );  // requires LPCTSTR and number of chars, not bytes
CString( LPCWSTR lpsz );               // requires null-terminiated Unicode buffer

Both seems not appropriate for my case.

score 0 · Answer 1 · answered Oct 30 '18 at 04:44

0

CString( LPCTSTR lpch, int nLength ) will do the job. It only needs a LPCTSTR cast, or LPCWSTR in this case. nLength should be the size divided by 2 to account for wchar_t

Use CStringW if your program is not Unicode.

Example:

//create a buffer (buf) and copy a wide string in to it
const wchar_t *source = L"ABC";
int len = wcslen(source);
int bytesize = len * 2;

BYTE *buf = new BYTE[bytesize + 2]; //optional +2 for null terminator
memcpy(buf, source, bytesize);

//copy buf into destination CString
CString destination((LPCWSTR)buf, bytesize / 2);
delete[]buf;

MessageBoxW(0, destination, 0, 0);

buf and bytesize are from the database, so just type in:

CString destination((LPCWSTR)buf, bytesize / 2);    
//or
CStringW destination((LPCWSTR)buf, bytesize / 2);

answered Oct 30 '18 at 04:44

Barmak Shemirani

30,904
6
40
77

How to convert CStringW to CString? Must I use macro like W2T? – alancc Oct 30 '18 at 07:34
Use `CW2A` to convert to ANSI, however some characters will be lost – Barmak Shemirani Oct 30 '18 at 07:45
@ala: You cannot **convert** from `CStringW` to `CStringA` without losing information. Unless your `CStringA` stores UTF-8 encoded code units. By default it uses ANSI encoding. In essence that question is underspecified unless you tell us about your destination encoding. – IInspectable Oct 30 '18 at 08:10
@BarmakShemirani, My code is to be compatible under both non-Unicode and Unicode, so the target is CString, instead of CStringA. Thus I need to convert CStringW generated from buf to CString. So I think of using W2T or CW2T. Is that correct? – alancc Oct 30 '18 at 09:30
1

@ala: You cannot make your code compatible with both UTF-16 and ANSI. ANSI cannot represent all code points available in Unicode. You **will** lose information. If you insist that you must use `CStringA`, you can either use UTF-8 encoding, or implement an application that will sometimes fail. Please read [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/). – IInspectable Oct 30 '18 at 09:35
1

I don't know why you want to make a Unicode program that's compatible with outdated non-Unicode. Just make one Unicode program. If you want to send data to the web etc. then convert from UTF16 to UTF8, `CW2A(string, CP_UTF8)`. UTF8 is based on `char`. A string such as `L"ελληνικά 汉语"` will lose a lot when converted to ANSI. – Barmak Shemirani Oct 30 '18 at 10:02
For UTF-8 encoding, how to convert number of bytes to number of characters? – alancc Oct 30 '18 at 23:10
@ala: Unicode is *way* more complex than that. While you can (easily) retrieve the number of code points any given sequence of code units represents, there is much more than just characters in Unicode. See [What's the difference between a character, a code point, a glyph and a grapheme?](https://stackoverflow.com/q/27331819/1889329) to get a basic understanding. – IInspectable Nov 02 '18 at 10:05

Convert a UTF-16 buffer into CString if only byte size is Known?

1 Answers1