You are confusing different concepts here.
Storage
This is how we save/store/hold our data. A std::string
is a collection of char
s, which are bytes. A std::wstring
is a collection of wchar_t
s, which are sometimes 2-byte wide value (but this is not guaranteed!).
Encoding
This is what the data means, and how it should be interpreted. A std::string
, a collection of bytes, could hold UTF-8, or UTF-16, or UTF-32, or ASCII, or ShiftJIS, or morse code, or a JPEG, or a movie, or my DNA (lucky string!).
There are some strong conventions in play in the world. For example, on Windows, a std::wstring
is generally accepted to hold UTF-16 (because the two-byte storage is convenient for this, and also because that's how the Windows API does it).
Newer versions of C++ give us things like std::u16_string
and std::u32_string
as well, which still do not directly have any notion of encoding, but are intended to be used for UTF-16 and UTF-32 respectively because their names make that intention more obvious to readers of code. C++20 will introduce std::u8_string
which is intended to signify a UTF-8 encoded string (and is otherwise more or less like a std::string
).
But these are just conventions. Nothing about the type std::string
says "UTF-8" or any other thing. It doesn't know about or care about or enforce any encoding. It just stores bytes.
So, your question about "converting UTF-8 to std::string
" does not really make any sense; it's like asking how to convert a road into a car.
"What should I do, then?"
Well, Base64 is also not an encoding. Well, actually, it totally is, but it's an encoding on top of the string encoding. It's a way of transmitting/escaping/sanitising the raw bytes, not a way of describing how to interpret them later. By asking cpprest to convert from Base64, that's just transforming the way the raw bytes are provided. That's why it gives you a std::vector<char>
rather than a std::string
because, although (as discussed above) std::string
doesn't care about encoding, we sometimes use a std::vector<char>
to really, properly, completely say that "this collection does not have any particular encoding, so please don't try to guess from convention or whatever what the encoding is in this use case; all it knows is that it is a bunch of bytes". This is down to opinion. Some people will still use a std::string
for that; the authors of cpprest decided not to.
The point is that the use of the function from_base64
cannot tell us anything about the encoding of the text that you've retrieved. For that, we have to go back to the documentation for the text. We have no access to that, and you did not tell us anything about it. If it were just a JSON string, the encoding would be down to the cpprest JSON library and so you'd already be done. However, it's not: it's something packed into a Base64 representation by whoever created the JSON object. Again, that information is not something that you shared with us.
But, based on the variable names you've chosen, the data you're looking at is already UTF-8. You've then attempted to convert it to UTF-16, which is rather the opposite of what you've described you wanted to do.
(Similarly, in your second example, you've taken a std::wstring
that [probably] already stores UTF-16 thanks to the L"wide string literal"
, then told the computer that it's UTF-8 and to convert it "again" to UTF-16, then extracted the raw bytes into a std::string
. None of that makes sense.)
Instead, why not literally just processXML(utf8_payload);
?
General advice
Encoding can be quite complex, although it's significantly easier to deal with once you've wrapped your mind around the basic concepts of all these layers of abstraction. For the future, and for this question if you wish to clarify it, you will need to ensure that you are absolutely clear, at each stage of the "pipeline" of your data as it gets transmitted from place A to place B, and gets converted from type C to type D, and whatever else, about what encoding it should be at each of those steps. If you want to change the encoding at one of those steps, then do so (though this should be rare!). But before you write any code make sure that you know for sure what it is that you need, otherwise you'll get yourself in a massive tangle.
Eventually you'll start to detect patterns that can help, though. For example, if you were expecting some delicious non-ASCII output and instead see strange text with lots of "Å" characters in it, that's probably UTF-8 that's being interpreted as ASCII by mistake. That's because of the way that the special sequence denoting Unicode codepoints larger than one byte in UTF-8 often starts with a byte whose numerical value is the same as that of the letter "Å" in ASCII (well, ISO/IEC 8859, but close enough).
Similarly, if you get Japanese and didn't expect it, in my experience that's usually because you've given the computer some bytes and told it that they are a string in UTF-16 encoding, when actually they were UTF-8. You just get more experienced at recognising these patterns as you work more, and it can help you to fix your bugs faster.
Just last week the last example there saved me quite a bit of time: I knew immediately that my source data must have been UTF-8, and was therefore able to quickly decide to remove the byte-copy into a std::wstring
that I'd been attempting. Examining the bytes in an encoding-agnostic way revealed the "Å" pattern as well and then that was that. This was important because I had no documentation for the data source and thus no way to just look up what the encoding was supposed to be. I had to guess/deduce it. Hopefully that won't be the case for you here.