Can you manually convert between std::string and std::wstring

Question

I'm using the new C++ <filesystem> library, and all the strings representing file and folder names are returned as const wchar_t* pointers. In my program, I use chars and std::string, which is fine as far as I know because all file paths can be written with ASCII characters (only one byte). There is a way to convert from std::wstring to std::string, but the <codecvt> header that's used for this is deprecated after C++17.

I was wondering, what would be the harm in simply just reading the value of each 2-byte wchar_t into a single-byte char? Whenever you have to convert from std::wstring (I'm assuming UTF-16) to single-byte std::string, any characters greater than 127 or 255 in the std::wstring can't be contained in the char, so they can't be converted in any way, right?

what about https://en.cppreference.com/w/cpp/filesystem/path/string — asmmo, Jun 19 '20 at 21:59
`char` strings can use multi-char sequences to represent values that do not fit into a single `char`. Also you should read [this](http://utf8everywhere.org) — , Jun 19 '20 at 22:15
_"all filepaths can be written with ASCII characters (only one byte)"_ Who says? — Asteroids With Wings, Jun 19 '20 at 22:15
_"any characters greater than 127 or 255 in the std::wstring can't be contained in the char, so they can't be converted in any way, right?"_ Sounds like you need to do some more reading on string encodings!! — Asteroids With Wings, Jun 19 '20 at 22:16
@AsteroidsWithWings You can store a character with a value of more than 255 in one byte? Also which characters in a filepath aren't covered by ASCII? — Zebrafish, Jun 19 '20 at 22:36
@Zebrafish "*You can store a character with a value of more than 255 in one byte?*" - there are single-byte character encodings, like Windows-125x, ISO-5589-x, etc that have single-byte *representations* of Unicode characters higher than 127. For example, byte `0x80` in Windows-1252 represents Unicode character U+20AC. There are many multi-byte character encodings available (UTFs, Shift-JIS, etc). These are commonly referred to as *charsets*, which are implemented as *code pages* on Windows. — Remy Lebeau, Jun 19 '20 at 22:52
@Zebrafish "*Also which characters in a filepath aren't covered by ASCII?*" - standard ASCII covers only Unicode characters U+0000..U+007F in bytes `0x00..0x7F` (0..127). But, there are "Extended ASCII" encodings that can cover higher Unicode characters in bytes `0x80-0xFF`, but these are non-standardized. And then there are the single-byte encodings, which are commonly referred to as "ANSI" encodings, though they are not really part of ANSI itself (which is its own standard). — Remy Lebeau, Jun 19 '20 at 22:57
@Zebrafish Have a look at [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/). — Remy Lebeau, Jun 19 '20 at 22:59

Can you manually convert between std::string and std::wstring

0 Answers0