Situation
I need a function that expects a string and encodes all non-ascii chars to utf-8 as hexadecimal number and substitutes it with that.
For example, ӷ in a word like "djvӷdio" should be substituted with "d3b7" while the rest remains untouched.
Explanation:
ӷ equals int 54199 and in hexadecimal d3b7
djvӷdio --> djvd3b7dio
I already have a function that returns the hex value of an int.
My Machine
- kubuntu, 19.10
- Compiler: g++ (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008
My Ideas
1. Idea
std::string encode_utf8(const std::string &str);
With the use of the function above I iterate through the whole string which contains unicode and if the current char is non-ascii I replace it with its hex value.
Problem:
Iterating through a string with unicode is not clever as a unicode char is made out of up to 4 bytes unlike a normal char. Therefore, a unicode char can be treated as multiple chars which outputs garbage. In easy words, the string cannot be indexed.
2. Idea
std::string encode_utf8(const std::wstring &wstr);
Again, I iterate through the whole string with unicode chars and if the current char is non-ascii I replace it with its hex value.
Problem:
Indexing works now but it returns a wchar_t with the corresponding utf-32 number but I definitely need the utf-8 number.
How can I get a char out of a string from which I can get the utf-8 decimal number?