1

I'm searching a technique to convert a string (a JSON) sent by a server containing something like this : ...."Test \u00e9\u00e9\u00e9"..... in something like : "Test ééé" I found a solution : boost::replace_all(listFolder, "\\u00e9", "é"); and I'm using this boost function with the other letters àùèê etc.... that's painful !

I wonder if there is a function who did this kind of conversion automatically.

Otherwise, I want to tell you something else, the server will treat correctly strings I send to it and containing letters with accents if I use this function :

std::string fromLocale(std::string localeStr)
{
    boost::locale::generator g;
    g.locale_cache_enabled(true);
    std::locale loc = g(boost::locale::util::get_system_locale());
    return boost::locale::conv::to_utf<char>(localeStr,loc);
}

unfortunately, the inverse of that code didn't work to treat the strings sent by the server.

std::string toLocale(std::string utf8Str)
{
    boost::locale::generator g;
    g.locale_cache_enabled(true);
    std::locale loc = g(boost::locale::util::get_system_locale());
    return boost::locale::conv::from_utf<char>(utf8Str,loc);
}
Aminos
  • 754
  • 1
  • 20
  • 40
  • It's not very clear to me what you're asking. Have a look at these snippets for decoding of JSON (Unicode) escapes: [`append_utf8` in this sample](http://stackoverflow.com/questions/27799086/getting-values-from-a-json-file-using-boost-property-tree-with-multiple-element/27799928#27799928) – sehe Jan 11 '15 at 21:24
  • 3
    Readers note, as @sehe explained, that the string described as `"Test \u00e9\u00e9\u00e9"` is the string `"Test \\u00e9\\u00e9\\u00e9"`, i.e. \, u, 0, 0, e and 9 are individual characters. – Cheers and hth. - Alf Jan 11 '15 at 21:24
  • yes Cheers and hth. - Alf – Aminos Jan 11 '15 at 21:26

2 Answers2

1

The JSON specification allows for "\uXXXX" sequences for Unicode characters (amonst other \X escape sequences). If you are not using an existing JSON parser that handles decoding such sequences, you will have to decode them manually, eg:

// JSON uses Unicode, but is commonly encoded as UTF-8. However, Unicode
// characters that are encoded in "\uXXXX" format are expressed as UTF-16
// codeunit values, using surrogate pairs for codepoint values U+10000 and
// higher. This example uses C++11's std::u16string to handle UTF-16 parsing.
// If you are not using C++11 or later, you can replace it with std::wstring
// on platforms where wchar_t is 16bit, for instance.  If you want to handle
// the JSON using std::string/UTF-8 instead, you will have to tweak this
// parsing accordingly...

std::u16string str = ...; // JSON quoted-string value, eg: "Test \u00e9\u00e9\u00e9"...
std::u16string::size_type idx = 0;
do
{
    idx = str.find(u'\\', idx);
    if (idx == std::u16string::npos) break;

    std::u16string replaceStr;
    std::u16string::size_type len = 2;

    char16_t ch = str.at(idx+1);
    switch (ch)
    {
        case u'\"':
        case u'\\':
        case u'/':
            replaceStr = ch;
            break;

        case u'b':
            replaceStr = u'\b';
            break;

        case u'f':
            replaceStr = u'\f';
            break;

        case u'n':
            replaceStr = u'\n';
            break;

        case u'r':
            replaceStr = u'\r';
            break;

        case u't':
            replaceStr = u'\t';
            break;

        case u'u':
        {
            std::u16string hexStr = str.substr(idx+2, 4);
            len += hexStr.size();

            std::basic_istringstream<char16_t> iss(hexStr);
            unsigned short value;
            iss >> std::hex >> value;
            if (!iss)
            {
                // illegal value, do something
            }

            replaceStr = (char_t) value;
            break;
        }

        default:
            // illegal sequence, do something
            break;
    }

    str.replace(idx, len, replaceStr);
    idx += replaceStr.size();
}
while (true);
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
0

The solution I found is to use RapidJson.

Aminos
  • 754
  • 1
  • 20
  • 40