5

I'm dealing with systems which manipulate "relaxed" JSON data which includes shell-style # line comments:

[
  {
    # Batman
    "first-name": "Bruce",
    "last-name": "Wayne"
  },
  {
    # Superman
    "first-name": "Clark",
    "last-name": "Kent"
  }
]

The part of the system I'm working on uses json-lib - which I'm surprised to discover is tolerant of the shell-style comments - to parse the JSON input.

I need to extract some additional annotation from those comments, but json-lib seems to just discard them without providing an API for reading them:

JSONObject map = (JSONObject)JSONSerializer.toJSON("{\n"+
                                                   "    # Batman\n" + // note the shell-style # comment
                                                   "    \"first-name\": \"Bruce\",\n" +
                                                   "    \"last-name\": \"Wayne\"\n" +
                                                   "}");
System.out.println(map.toString());
/* <<'OUTPUT'
 * {"first-name":"Bruce","last-name":"Wayne"}
 * OUTPUT
 * note the absence of the shell-style comment
 */

This makes sense since comments aren't part of the JSON spec and I'm lucky json-lib doesn't just choke when parsing them in the first place.

Of note:

  • other systems consume this same JSON and the annotations need to be transparent to them, so the JSON structure can't be modified by adding properties for the comments instead.
  • not all the components and objects in my system have access to the raw JSON source: one component reads the file and parses it using JSONlib and passes de-serialized maps etc around.

How can I read and parse these comments while processing the JSON input? Is there a library which will allow me to read them and relate them to their position in the JSON - can I easily connect the Batman comment to the "Bruce Wayne" entry?

I'm currently using json-lib, but I'm open to investigating other JSON libraries and equally open to using other languages which extend JSON, such as YAML - but I'm not sure those tools will allow me to read and process the comments in my input.

Richard JP Le Guen
  • 28,364
  • 7
  • 89
  • 119
  • http://www.lifl.fr/~riquetd/parse-a-json-file-with-comments.html This link uses regex: `'(^)?[^\S\n]*/(?:\*(.*?)\*/[^\S\n]*|/[^\n]*)($)?'` to remove comments. Of course you can use the same regex for other purposes. –  May 06 '13 at 16:17
  • @remyabel - Not all the components and objects in my system have access to the raw JSON source: one component reads the file and parses it using JSONlib and passes de-serialized maps etc around. – Richard JP Le Guen May 06 '13 at 16:23
  • 2
    you could always not put meaningful data in a "throw away" location? just like i don't store my important things in the trash can outside my house... – jtahlborn May 06 '13 at 16:45
  • @jtahlborn - I'm hoping someone will have an answer for my question - magic is optional. The mainstream parsers don't seem to do this, as far as I can tell - but if they do and someone points out an obscure feature I've never noticed I'll be happy. – Richard JP Le Guen May 06 '13 at 19:10
  • There's a reason none of the mainstream parsers do this - as you said, it's not part of the standard. – Jason May 07 '13 at 00:35
  • How about you just run a regex replace and change `"# Batman\n"` to `"\"comment-127262896\":\"Batman\"\n"`? The parser does the work for you, it's associated with the object, you can toss out the data when you're done with it, etc. Of course it needs access to the raw JSON source that still has the comments in it. You'd recognize the comments by name, of course. Use a nice long name to prevent clashes and a random or incrementing ID, allowing for several comments in a row, etc. Be careful with where the comment is: In arrays or objects, and beware of the comma. – DDS May 13 '13 at 03:49
  • @DDS - The examples I included in the question only have one comment per entry, but the comments could be more complex. I know I can hack a solution with RegEx etc but I'd prefer to find out if there are libraries which support this. – Richard JP Le Guen May 13 '13 at 04:20

2 Answers2

4

EDIT (May 27, 2021):

What I chose to do is write a custom JSON parser for this nonstandard version of JSON. It supports the shell comments given in your question, but only before the first key of a JSON object:

(The preceding is in C#; a Java version is expected to be available as well. It relies on my Concise Binary Object Representation library, called PeterO.Cbor in NuGet or com.upokecenter/cbor in the Central Repository.)

Indeed, one of your requirements is that "the JSON structure can't be modified by adding properties for the comments instead." That means the comments must be associated to the JSON objects in some other way. Fortunately, a specification called JSON Pointer was recently published as RFC 6901. JSON Pointer is a string that refers to a JSON object within another JSON object. This is why the parser includes a way to get the comment and its associated JSON pointer via a method called JSONWithComments.FromJSONStringWithPointers. JSONPointer.cs is my own implementation of the JSON Pointer specification.

Example of use:

      dict=new Dictionary<string, string>();
      str="{\"f\":[\n {\n # B\t \tA C\n # Dm\n\"a\":1,\n\"b\":2\n},{\n #" +
"\u0020Sm\n\"a\":3,\n\"b\":4\n}\n]}";
       obj = JSONWithComments.FromJSONString(str);
      Console.WriteLine(obj);
       obj = JSONWithComments.FromJSONStringWithPointers(str, dict);
       // Get the comment and its associated JSON pointer
       foreach(string key in dict.Keys) {
         Console.WriteLine(key);
         Console.WriteLine(dict[key]);
         // Get the pointed-to object
         Console.WriteLine(JSONPointer.GetObject(obj,dict[key]));
       }
       // Output the object
      Console.WriteLine(obj);
Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • Just to check if I'm understanding: this only allows one comment per object? – Richard JP Le Guen May 08 '13 at 15:34
  • The current implementation coalesces comments that occur right next to each other, but not comments that occur in different places within the same sub-object. I will make this clear with further examples. – Peter O. May 08 '13 at 19:34
  • I'm still considering all my options, and considering changing the interchange format outright. If I go ahead with this solution, I'll give you the green tick. – Richard JP Le Guen May 13 '13 at 19:41
0

other systems consume this same JSON and the annotations need to be transparent to them, so the JSON structure can't be modified by adding properties for the comments instead

Using comments in messages to pass data between systems doesn't seem a good practice. . E.g. XML wouldn't support that.

Why not simply incorporate the important "comments" as data? That's what it is if the other system is using it. :^)

Glen Best
  • 22,769
  • 3
  • 58
  • 74
  • "Using comments in messages to pass data between systems doesn't seem a good practice." Agreed. As for "if the other system is using it": what I mean by "the annotations need to be transparent to them" is that the comment-data cannot be in the JSON is that it can't just be added as a new key to the JSON object - then the other system will process the annotations which they are not supposed to. – Richard JP Le Guen May 08 '13 at 15:33