2

I have a strange problem for which I believe there is a solution but I cannot find it. Your help would be appreciated.

On the one hand, I have a native C++ class named Native which has a static wchar_t array containing accentuated characters. This array is const and defined at build time.

/// Header file
Native
{
public:
    static const wchar_t* Array() const { return mArray; }

private:
    static const wchar_t *mArray;
};

//--------------------------------------------------------------

/// .cpp file
const wchar_t* Native::mArray = {L"This is a description éàçï"};

On the other hand, I have a C++/CLI class that uses the array like this:

/// C++/CLI use
System::String^ S1 = gcnew System::String( Native::Array() );
System::String^ S2 = gcnew System::String( L"This is a description éàçï" };

The problem is that while S2 gives This is a description éàçï as expected, S1 gives This is a description éà çï. I do not understand why passing a pointer to a static array will not give the same result as giving the same array directly???

I guess this is an encoding problem but I would have expected the same results for both S1 and S2. Do you know how to solve the problem? The way I must use it in my program is like S1 i.e. by accessing the build time static array with a static method that returns a const wchar_t*.

Thanks for your help!


EDIT 1

What is the best way to define literals at build time in C++ using Intel C++ 13.0 to make them directly usable in C++/CLI System::String constructor? This could be the ultimate question for my problem.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
dom_beau
  • 2,437
  • 3
  • 30
  • 59
  • 1
    The result you get is **exactly** as UTF-8 interpreted as Windows ANSI Western, but that does not make sense for `wchar_t` based. Is this the real code? Anyway, note that you don't need a separate `mArray` when you only access it via `Array()`: just return the literal directly in that function. – Cheers and hth. - Alf Aug 22 '14 at 13:59
  • 2
    Your "native array" contains text encoded in utf-8. Not appropriate of course for a const wchar_t*, that ought to be utf-16. This bug is located in whatever native code that reads the text, probably by not guessing at the encoding of a text file correctly or ignoring a BOM. Standard C++ bug. – Hans Passant Aug 22 '14 at 14:03
  • @Alf No this is not the real code and no I cannot simply use an array directly in Array(). My code is much more complicated, using CRTP, etc. Thanks for the hint about UTF-8 vs Windows ANSI Western... – dom_beau Aug 22 '14 at 14:16
  • @HansPassant Well, what is the way to give a UTF-16 build time string to an array? I (erroneously) tough that prefixing it with "L" would do the job but it seems it is not the case. So `const wchar_t* mArray = { **??????** }; – dom_beau Aug 22 '14 at 14:19
  • @dom_beau: without real code it will just be guessing. so, try to create a small (minimal) but complete example. include your build commands and result. – Cheers and hth. - Alf Aug 22 '14 at 14:34

1 Answers1

2

I don't have enough reputation to add a comment to ask this question, so I apologize for posting this as an answer if that seems inappropriate.

Could the problem be that your compiler defines wchar_t to be 8 bits? I'm basing that is possible on this answer:

Should I use wchar_t when using UTF-8?

To answer your question (in the comments) about building a UTF-16 array at build time, I believe you can force it to be UTF-16 by using u"..." for your literal instead of L"..." (see http://en.cppreference.com/w/cpp/language/string_literal)

Edit 1: For what it's worth, I tried your code (after fixing a couple compile errors) using Microsoft Visual Studio 10 and didn't have the same problem (both strings printed as expected).

I don't know if it will help you, but another possible way to statically initialize this wchar_t array is to use std::wstring to wrap your literal and then set your array to the c-string pointer returned by wstring::c_str(), shown as follows:

std::wstring ws(L"This is a description éàçï");
const wchar_t* Native::mArray = ws.c_str();

This edit was inspired by Dynamic wchar_t array (C++ beginner)

Community
  • 1
  • 1
Tim
  • 153
  • 1
  • 7
  • I Tim, no I'm on Win7 64bits and with VisualStudio my projects use wchar_t as 16 bit wide. I tried u"..." rather than L"..." and I got a syntax error. This is strange. I use Intel C++ 13.0 and it supports C++11. – dom_beau Aug 22 '14 at 15:34
  • Unfortunately user-defined literals are not supported by the Intel C++ 13.0 compiler. [link](https://software.intel.com/en-us/articles/c0x-features-supported-by-intel-c-compiler) – Tim Aug 22 '14 at 15:59
  • So How should I define a UTF-16 literal stored in a wchar_t* at build time? – dom_beau Aug 22 '14 at 16:55
  • 1
    See my "Edit 1" above for another possible way to populate your wchar_t array. – Tim Aug 22 '14 at 18:57
  • Hi Tim, I don't know how you succeeded to compile it and get the expected results but thank you for your hint. I'll see if I can use it. – dom_beau Aug 25 '14 at 12:34
  • Keep in mind that I used the Microsoft C++ compiler that comes with Microsoft Visual Studio 2010. I don't have access to an Intel C++ 13.0 compiler. If using wstring doesn't work for you I recommend editing your question to provide real code along with your project settings. – Tim Aug 25 '14 at 14:39