0

I have a web server developed in C++. In this web server, the data is received from the client side and stored in the database. Some of this data is in Persian, which is converted to Unicode UTF-8 format.
as example:

data string is "سلام" in client side
when i get data, in webserver
"D8%B3%D9%84%D8%A7%D9%85"

I want to convert UTF-8 Code to c++ string, How can I do this conversion?

Ben
  • 624
  • 6
  • 16
MJMRL
  • 31
  • 5
  • Does this answer your question? [How do I properly use std::string on UTF-8 in C++?](https://stackoverflow.com/questions/50403342/how-do-i-properly-use-stdstring-on-utf-8-in-c) – Tom Jun 17 '20 at 07:53
  • As already described in the answer it's not UTF-8 encoded but [URL encoded](https://www.w3schools.com/tags/ref_urlencode.ASP). You've forgotten the first `%`. It's `"%D8%B3%D9%84%D8%A7%D9%85"`. That's how you would do it in JavaScript: https://wandbox.org/permlink/4WRinFLfKHNoEfoe. You need a way encode and decode in C++. [libcurl](https://curl.haxx.se/libcurl/) is a library that can do this in C (and C++) with [curl_easy_escape](https://curl.haxx.se/libcurl/c/curl_easy_escape.html) and [curl_easy_unescape](https://curl.haxx.se/libcurl/c/curl_easy_unescape.html) – Thomas Sablik Jun 17 '20 at 08:58

1 Answers1

3

Your string is not UTF-8 encoded but uses a custom encoding similiar to HTTP url query params.

% indicates that the next two characters encode a single byte in hex. You will need to parse for % and if you encounter such a character, interpret the next two characters as a hexadecimal encoded byte. Otherwise you just copy the characters/bytes over.

Sebastian Hoffmann
  • 2,815
  • 1
  • 12
  • 22