0

I'm tying to get the character coding type of a json string from jsoncpp: UTF-8, ANSI or UNICODE? How to get character coding type of a json::value? Thanks advance!

XHLin
  • 301
  • 1
  • 3
  • 13

2 Answers2

0

Any string is just a sequence of bytes, conforming, may be, to some basic rules (null terminators, prohibited symbols for json, etc). There is no magic way to determine which encoding was used to form a string, because encoding is just a way to represent string binary data. So json string encoding should be either specified by the json issuer (in documentation perhaps), or information about it should be a part of a json (if for some reason different strings has a different encoding).

Ari0nhh
  • 5,720
  • 3
  • 28
  • 33
0

Determining the character encoding of a string is quite complicated. See this SO answer for choosing the right application.

Apache Tika - the content analysis toolkit is maybe one of the most advanced, according to the following quote:

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more. You can find the latest release on the download page.

Analyzing a JSON string could be done with each of these libraries resulting in a (probable) CharSet usable for further processing.

Community
  • 1
  • 1
zx485
  • 28,498
  • 28
  • 50
  • 59