7

I have a number of objects (messages) that I need to validate against a JSON schema (draft-04). Each objects is guaranteed to have a "type" field, which describes its type, but every type have a completely different set of other fields, so each type of object needs a unique schema.

I see several possibilities, none of which are particularly appealing, but I hope I'm missing something.

Possibility 1: Use oneOf for each message type. I guess this would work, but the problem is very long validation errors in case something goes wrong: validators tend to report every schema that failed, which include ALL elements in "oneOf" array.

{
  "oneOf":
  [
    {
      "type": "object",
      "properties":
      {
        "t":
        {
          "type": "string",
          "enum":
          [
            "message_type_1"
          ]
        }
      }
    },
    {
      "type": "object",
      "properties":
      {
        "t":
        {
          "type": "string",
          "enum":
          [
            "message_type_2"
          ]
        },
        "some_other_property":
        {
          "type": "integer"
        }
      },
      "required":
      [
        "some_other_property"
      ]
    }
  ]
}

Possibility 2: Nested "if", "then", "else" triads. I haven't tried it, but I guess that maybe errors would be better in this case. However, it's very cumbersome to write, as nested if's pile up.

Possibility 3: A separate scheme for every possible value of "t". This is the simplest solution, however I dislike it, because it precludes me from using common elements in schemas (via references).

So, are these my only options, or can I do better?

Relequestual
  • 11,631
  • 6
  • 47
  • 83
MaxEd
  • 360
  • 1
  • 10
  • Option 3 doesn't preclude you from using references. Referencing part of another schema file is totally valid and possible. Not saying it's the best option though. – Relequestual Apr 16 '18 at 09:01
  • I think option 1 is your best option here. A validator is right to report all errors from the `oneOf` if it doesn't fulfill any of the schemas in the array. Are you expecting to be able to send back any error messages to the user for validation feedback? – Relequestual Apr 16 '18 at 09:05
  • I want the user - actually, rather, the developer or the test - to be able to quickly pinpoint the problem. There will be at least several dozens of message types, and getting one huge error that lists all of them is not exactly conductive to that goal. I'm thinking about Option 3 more and more. You're right in that I still can use refs, but I'll have to load the file containing that ref for each message schema. Not ideal, but it might have to do. – MaxEd Apr 16 '18 at 10:21
  • Agreed. Yes, you'll have to load them in for the library if it doesn't support the file URI protocol (some do, but it's not defined behaviour). It's less ideal to perform several HTTP requests when you need to do validation! – Relequestual Apr 16 '18 at 10:25

1 Answers1

3

Since "type" is a JSON Schema keyword, I'll follow your lead and use "t" as the type-discrimination field, for clarity.

There's no particular keyword to accomplish or indicate this (however, see https://github.com/json-schema-org/json-schema-spec/issues/31 for discussion). This is because, for the purposes of validation, everything you need to do is already possible. Errors are secondary to validation in JSON Schema. All we're trying to do is limit how many errors we see, since it's obvious there's a point where errors are no longer productive.

Normally when you're validating a message, you know its type first, then you read the rest of the message. For example in HTTP, if you're reading a line that starts with Date: and the next character isn't a number or letter, you can emit an error right away (e.g. "Unexpected tilde, expected a month name").

However in JSON, this isn't true, since properties are unordered, and you might not encounter the "t" until the very end, if at all. "if/then" can help with this.

But first, begin by by factoring out the most important constraints, and moving them to the top.

First, use "type": "object" and "required":["t"] in your top level schema, since that's true in all cases.

Second, use "properties" and "enum" to enumerate all its valid values. This way if "t" really is entered wrong, it will be an error out of your top-level schema, instead of a subschema.

If all of these constraints pass, but the document is still invalid, then it's easier to conclude the problem must be with the other contents of the message, and not the "t" property itself.

Now in each sub-schema, use "const" to match the subschema to the type-name.

We get a schema like this:

{
  "type": "object",
  "required": ["t"],
  "properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
  "oneOf": [
     {
        "type": "object",
        "properties": {
          "t": { "const": "message_type_1" }
        }
     },
     {
        "type": "object",
        "properties": 
          "t": { "const": "message_type_2" },
          "some_other_property": {
             "type": "integer"
          }
        },
        "required": [ "some_other_property" ]
     }
  ]
}

Now, split out each type into a different schema file. Make it human-accessible by naming the file after the "t". This way, an application can read a stream of objects and pick the schema to validate each object against.

{
  "type": "object",
  "required": ["t"],
  "properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
  "oneOf": [
     {"$ref": "message_type_1.json"},
     {"$ref": "message_type_2.json"}
  ]
}

Theoretically, a validator now has enough information to produce much cleaner errors (though I'm not aware of any validators that can do this).

So, if this doesn't produce clean enough error reporting for you, you have two options:

First, you can implement part of the validation process yourself. As described above, use a streaming JSON parser like Oboe.js to read each object in a stream, parse the object and read the "t" property, then apply the appropriate schema.

Or second, if you really want to do this purely in JSON Schema, use "if/then" statements inside "allOf":

{
  "type": "object",
  "required": ["t"],
  "properties": { "t": { "enum": ["message_type_1", "message_type_2"] } }
  "allOf": [
     {"if":{"properties":{"t":{"const":"message_type_1"}}}, "then":{"$ref": "message_type_1.json"}},
     {"if":{"properties":{"t":{"const":"message_type_2"}}}, "then":{"$ref": "message_type_2.json"}}
  ]
}

This should produce errors to the effect of:

t not one of "message_type_1" or "message_type_2"

or

(because t="message_type_2") some_other_property not an integer

and not both.

awwright
  • 565
  • 3
  • 8
  • Thanks for a detailed answer. I'll accept it, though I have no time to really test it right now, because I already implemented separate schemas for different types with some hackery, but I might want to come back to this some time in the future. – MaxEd May 07 '18 at 08:26