1

An old method contains code like the following (anonymised):

        std::wstring wstr = ...;
        std::string str(wstr.begin(), wstr.end());

Previously this all compiled without warnings but as we update to C++17 and VS2019 (v142) and tidy project settings, it now gives these big scary warnings:

C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\include\xstring(2468,23): warning C4244: 'argument': conversion from 'wchar_t' to 'const _Elem', possible loss of data
        with
        [
            _Elem=char
        ]
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\include\xstring(2479): message : see reference to function template instantiation 'void std::basic_string<char,std::char_traits<char>,std::allocator<char>>::_Construct<wchar_t*>(_Iter,const _Iter,std::input_iterator_tag)' being compiled
        with
        [
            _Iter=wchar_t *
        ]
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\include\xstring(2479): message : see reference to function template instantiation 'void std::basic_string<char,std::char_traits<char>,std::allocator<char>>::_Construct<wchar_t*>(_Iter,const _Iter,std::input_iterator_tag)' being compiled
        with
        [
            _Iter=wchar_t *
        ]
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\include\xstring(2459): message : see reference to function template instantiation 'void std::basic_string<char,std::char_traits<char>,std::allocator<char>>::_Construct<wchar_t*>(const _Iter,const _Iter,std::forward_iterator_tag)' being compiled
        with
        [
            _Iter=wchar_t *
        ]
C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.28.29333\include\xstring(2459): message : see reference to function template instantiation 'void std::basic_string<char,std::char_traits<char>,std::allocator<char>>::_Construct<wchar_t*>(const _Iter,const _Iter,std::forward_iterator_tag)' being compiled
        with
        [
            _Iter=wchar_t *
        ]

message : see reference to function template instantiation 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>::basic_string<std::_String_iterator<std::_String_val<std::_Simple_types<_Elem>>>,0>(_Iter,_Iter,const _Alloc &)' being compiled
        with
        [
            _Elem=wchar_t,
            _Iter=std::_String_iterator<std::_String_val<std::_Simple_types<wchar_t>>>,
            _Alloc=std::allocator<char>
        ]
message : see reference to function template instantiation 'std::basic_string<char,std::char_traits<char>,std::allocator<char>>::basic_string<std::_String_iterator<std::_String_val<std::_Simple_types<_Elem>>>,0>(_Iter,_Iter,const _Alloc &)' being compiled
        with
        [
            _Elem=wchar_t,
            _Iter=std::_String_iterator<std::_String_val<std::_Simple_types<wchar_t>>>,
            _Alloc=std::allocator<char>
        ]

I am pretty sure this code pre-dates use of UNICODE in our codebase - it seems to work but I don't really understand the warnings or what I should do about it.

I found this question: UTF8 to/from wide char conversion in STL but the nice neat solution has comments saying it's deprecated in C++17! It's somewhat a mystery why this code mixes string and wstring in the first place, is there an easy solution? Or is this a case "just leave it if it works?!"

Mr. Boy
  • 60,845
  • 93
  • 320
  • 589
  • @Someprogrammerdude sorry I was writing a different question and forgot I hadn't changed the title. Replaced it now, thanks for pointing it out! – Mr. Boy Jul 13 '21 at 14:58
  • 2
    A `wstring` holds a `wchar_t` array. `wchar_t` is wider than `char`, and "supports" characters that `char` doesn't. The warnings are the compilers way of telling you that. – NathanOliver Jul 13 '21 at 14:59
  • 1
    Event without warnings this code can work for ASCII characters only. Any none ASCII character will lead to strange errors and text distortions. – Marek R Jul 13 '21 at 15:04
  • Does anyone have any idea when/which settings lead to this warning - I checked our build logs and I don't see it before I started updating things. I assume either the C++ or platform library version? – Mr. Boy Jul 13 '21 at 15:08

2 Answers2

2

The issue is that you are converting from a 16 bit string to an 8 bit string. Since 16 bits hold more data than 8, data will then get lost. If you are converting between UTF-16 and UTF-8, you need to do it properly with a conversion library.

C++ does provide conversion library in the form of: codecvt (Deprecated in C++17 but still there for a while).

If you are sure the string only contains ASCII, you can suppress the warning.

See https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16 for details

doron
  • 27,972
  • 12
  • 65
  • 103
  • 2
    It should be noted that `` is was deprecated in C++17 and may one day be removed from the language. – NathanOliver Jul 13 '21 at 15:06
  • 1
    Ah it's only deprecated not removed? In that case the solution in my linked question may be valid at least for now – Mr. Boy Jul 13 '21 at 15:07
  • 2
    Note that `wchar_t` (and thus, the elements of a `std::wstring`) can be wider than 16 bits, and often are (Linux has 32 bits). Windows seems the only common platform where it is still 16 bits. – Adrian Mole Jul 13 '21 at 15:09
  • 2
    @Mr.Boy As of C++20, and the C++23 draft, it's still in the language. You just not guarantee moving forward from there that it will still exist, although I think it will for some time since there is bounds to be lots of code using it. – NathanOliver Jul 13 '21 at 15:10
  • 2
    The 200lb Gorilla for unicode is http://site.icu-project.org/ – doron Jul 13 '21 at 15:14
1

The warning is quite clear on its own.

warning C4244: 'argument': conversion from 'wchar_t' to 'const _Elem', possible loss of data

Which means, this line std::string str(wstr.begin(), wstr.end()) involves a type casting from wchar_t to a narrower data type const _Elem a.k.a char. Since any narrowing cast may lead to data loss, hence the warning.

Consider an example as following:

#include <cstddef>
#include <iostream>
#include <string>

int main() {
    std::wstring ws{};
    auto c = (wchar_t)0x41'42'43'44; // A'B'C'D in ASCII

    for (int i = 0; i < 3; ++i)
        ws.push_back(c);

    std::string str{ws.begin(), ws.end()};
    std::cout << str.c_str() << std::endl;
}

The code above run and print DDD.

On 64 bit machine, the constructor of str move 4 bytes at a time to read a wchar_t. However, string type can only accept element as char ==> the constructor must perform a narrowing cast from wchar_t to char which results in a loss of 3 byte A B C for each wchar_t element.

mibu
  • 1,303
  • 11
  • 14