26

I have a set of jsonschema compliant documents. Some documents contain references to other documents (via the $ref attribute). I do not wish to host these documents such that they are accessible at an HTTP URI. As such, all references are relative. All documents live in a local folder structure.

How can I make python-jsonschema understand to properly use my local file system to load referenced documents?


For instance, if I have a document with filename defs.json containing some definitions. And I try to load a different document which references it, like:

{
  "allOf": [
    {"$ref":"defs.json#/definitions/basic_event"},
    {
      "type": "object",
      "properties": {
        "action": {
          "type": "string",
          "enum": ["page_load"]
        }
      },
      "required": ["action"]
    }
  ]
}

I get an error RefResolutionError: <urlopen error [Errno 2] No such file or directory: '/defs.json'>

It may be important that I'm on a linux box.


(I'm writing this as a Q&A because I had a hard time figuring this out and observed other folks having trouble too.)

Chris W.
  • 37,583
  • 36
  • 99
  • 136

6 Answers6

22

I had the hardest time figuring out how to resolve against a set of schemas that $ref each other without going to the network. It turns out the key is to create the RefResolver with a store that is a dict which maps from url to schema.

import json
from jsonschema import RefResolver, Draft7Validator

address="""
{
  "$id": "https://example.com/schemas/address",

  "type": "object",
  "properties": {
    "street_address": { "type": "string" },
    "city": { "type": "string" },
    "state": { "type": "string" }
  },
  "required": ["street_address", "city", "state"],
  "additionalProperties": false
}
"""

customer="""
{
  "$id": "https://example.com/schemas/customer",
  "type": "object",
  "properties": {
    "first_name": { "type": "string" },
    "last_name": { "type": "string" },
    "shipping_address": { "$ref": "/schemas/address" },
    "billing_address": { "$ref": "/schemas/address" }
  },
  "required": ["first_name", "last_name", "shipping_address", "billing_address"],
  "additionalProperties": false
}
"""

data = """
{
  "first_name": "John",
  "last_name": "Doe",
  "shipping_address": {
    "street_address": "1600 Pennsylvania Avenue NW",
    "city": "Washington",
    "state": "DC"
  },
  "billing_address": {
    "street_address": "1st Street SE",
    "city": "Washington",
    "state": "DC"
  }
}
"""

address_schema = json.loads(address)
customer_schema = json.loads(customer)
schema_store = {
    address_schema['$id'] : address_schema,
    customer_schema['$id'] : customer_schema,
}

resolver = RefResolver.from_schema(customer_schema, store=schema_store)
validator = Draft7Validator(customer_schema, resolver=resolver)

jsonData = json.loads(data)
validator.validate(jsonData)

The above was built with jsonschema==4.9.1.

Daniel Schneider
  • 1,797
  • 7
  • 20
  • I initialize the RefResolver like this: `jsonschema.RefResolver(None, referrer=None, store=schema_store)`. And then the store has entries with an "$id" field like: `"https://example.com/path/subpath/filename.json"`. (This doesn't require any network calls--unless you specify a schema not in the store--since the store contains a cache of any reference we need). – Tim Ludwinski Nov 20 '20 at 13:31
  • why you used the **#** sign at the end of the `$ref`? `{"$ref": "base.schema.json#"},` instead of putting it as prefix? – A l w a y s S u n n y Sep 10 '21 at 11:43
  • the # sign (which delineates an [URI fragment](https://datatracker.ietf.org/doc/html/rfc3986#section-3.5)) is superfluous in the above example. In $ref URIs, the fragment refers to a path within a schema, so, in the above sample `{"$ref": "base.schema.json#/properties/prop/type"}` would resolve to `"string"`. – Daniel Schneider Sep 29 '21 at 06:05
  • After learning a bunch about JSONSchemas over the past 2 years, I improved my sample quite a bit... – Daniel Schneider Oct 25 '22 at 13:25
10

You must build a custom jsonschema.RefResolver for each schema which uses a relative reference and ensure that your resolver knows where on the filesystem the given schema lives.

Such as...

import os
import json
from jsonschema import Draft4Validator, RefResolver # We prefer Draft7, but jsonschema 3.0 is still in alpha as of this writing 


abs_path_to_schema = '/path/to/schema-doc-foobar.json'
with open(abs_path_to_schema, 'r') as fp:
  schema = json.load(fp)

resolver = RefResolver(
  # The key part is here where we build a custom RefResolver 
  # and tell it where *this* schema lives in the filesystem
  # Note that `file:` is for unix systems
  schema_path='file:{}'.format(abs_path_to_schema),
  schema=schema
)
Draft4Validator.check_schema(schema) # Unnecessary but a good idea
validator = Draft4Validator(schema, resolver=resolver, format_checker=None)

# Then you can...
data_to_validate = `{...}`
validator.validate(data_to_validate)
Chris W.
  • 37,583
  • 36
  • 99
  • 136
  • Is this because the JSON schema spec says it is a URI. And URI cannot be relative paths? So if we end up with a relative path, we are not writing a proper spec-compliant json schema? – CMCDragonkai Mar 19 '19 at 04:41
  • My tests show that `definitions` is not necessary. One can just compose complete JSON schema documents without needing the `#...` part. I wonder if the `definitions` is just optional or convention. – CMCDragonkai Mar 19 '19 at 04:45
  • 1
    jsonschema 3.0.1 with draft 7 as default is out now (per your comment in the example saying you prefer draft 7) – Joao Coelho Mar 20 '19 at 22:25
4

EDIT-1

Fixed a wrong reference ($ref) to base schema. Updated the example to use the one from the docs: https://json-schema.org/understanding-json-schema/structuring.html

EDIT-2

As pointed out in the comments, in the following I'm using the following imports:

from jsonschema import validate, RefResolver 
from jsonschema.validators import validator_for

This is just another version of @Daniel's answer -- which was the one correct for me. Basically, I decided to define the $schema in a base schema. Which then release the other schemas and makes for a clear call when instantiating the resolver.

  • The fact that RefResolver.from_schema() gets (1) some schema and also (2) a schema-store was not very clear to me whether the order and which "some" schema were relevant here. And so the structure you see below.

I have the following:

base.schema.json:

{
  "$schema": "http://json-schema.org/draft-07/schema#"
}

definitions.schema.json:

{
  "type": "object",
  "properties": {
    "street_address": { "type": "string" },
    "city":           { "type": "string" },
    "state":          { "type": "string" }
  },
  "required": ["street_address", "city", "state"]
}

address.schema.json:

{
  "type": "object",

  "properties": {
    "billing_address": { "$ref": "definitions.schema.json#" },
    "shipping_address": { "$ref": "definitions.schema.json#" }
  }
}

I like this setup for two reasons:

  1. Is a cleaner call on RefResolver.from_schema():

    base = json.loads(open('base.schema.json').read())
    definitions = json.loads(open('definitions.schema.json').read())
    schema = json.loads(open('address.schema.json').read())
    
    schema_store = {
      base.get('$id','base.schema.json') : base,
      definitions.get('$id','definitions.schema.json') : definitions,
      schema.get('$id','address.schema.json') : schema,
    }
    
    resolver = RefResolver.from_schema(base, store=schema_store)
    
  2. Then I profit from the handy tool the library provides give you the best validator_for your schema (according to your $schema key):

    Validator = validator_for(base)
    
  3. And then just put them together to instantiate validator:

    validator = Validator(schema, resolver=resolver)
    

Finally, you validate your data:

data = {
  "shipping_address": {
    "street_address": "1600 Pennsylvania Avenue NW",
    "city": "Washington",
    "state": "DC"   
  },
  "billing_address": {
    "street_address": "1st Street SE",
    "city": "Washington",
    "state": 32
  }
}
  • This one will crash since "state": 32:
>>> validator.validate(data)

ValidationError: 32 is not of type 'string'

Failed validating 'type' in schema['properties']['billing_address']['properties']['state']:
    {'type': 'string'}

On instance['billing_address']['state']:
    32

Change that to "DC", and will validate.

Brandt
  • 5,058
  • 3
  • 28
  • 46
  • 1
    This answer worked perfectly for me. Just want to point out the import dependencies for others also trying this out ```from jsonschema import validate, RefResolver``` ```from jsonschema.validators import validator_for``` – khuang834 Jan 08 '21 at 18:19
  • Thank you @khuang834. I adjusted/added a note about that. – Brandt Jan 11 '21 at 11:59
  • How we can validate `Nested properties` with this approach. Say I have a conditional property `"Zipcode"` in `address.schema.json` . And want to validate based on value of `"city"` in `definitions.schema.json`. – curiousguy May 11 '21 at 17:45
  • @curiousguy maybe I didn't understand your question or it is ill posed. Nevertheless, here are my thoughts that hopefully will help you go through your problem: `zipcode` and `city` go hand-in-hand, they are both part of the/an address. AFAIU, you want to _verify_ that a given `zipcode` is part of a `city`. IMHO that this is external to a _schema validation_: roughly, schema validation is about data types/formats. That being said, you _could_ include a "conditional property" for a set of cities and associate zipcodes (`enum`) for each city in another "cities-definitions.schema.json` probably. – Brandt May 13 '21 at 13:20
  • it works for me but why you added **#** at the end of the ref value`{ "$ref": "definitions.schema.json#" }`? instead of `{ "$ref": "#/definitions.schema.json" }` – A l w a y s S u n n y Sep 10 '21 at 12:02
3

Following up on the answer @chris-w provided, I wanted to do this same thing with jsonschema 3.2.0 but his answer didn't quite cover it I hope this answer helps those who are still coming to this question for help but are using a more recent version of the package.

To extend a JSON schema using the library, do the following:

  1. Create the base schema:
base.schema.json
{
  "$id": "base.schema.json",
  "type": "object",
  "properties": {
    "prop": {
      "type": "string"
    }
  },
  "required": ["prop"]
}
  1. Create the extension schema
extend.schema.json
{
  "allOf": [
    {"$ref": "base.schema.json"},
    {
      "properties": {
        "extra": {
          "type": "boolean"
        }
      },
      "required": ["extra"]
    }
  ]
}
  1. Create your JSON file you want to test against the schema
data.json
{
  "prop": "This is the property",
  "extra": true
}
  1. Create your RefResolver and Validator for the base Schema and use it to check the data
#Set up schema, resolver, and validator on the base schema
baseSchema = json.loads(baseSchemaJSON) # Create a schema dictionary from the base JSON file
relativeSchema = json.loads(relativeJSON) # Create a schema dictionary from the relative JSON file
resolver = RefResolver.from_schema(baseSchema) # Creates your resolver, uses the "$id" element
validator = Draft7Validator(relativeSchema, resolver=resolver) # Create a validator against the extended schema (but resolving to the base schema!)

# Check validation!
data = json.loads(dataJSON) # Create a dictionary from the data JSON file
validator.validate(data)

You may need to make a few adjustments to the above entries, such as not using the Draft7Validator. This should work for single-level references (children extending a base), you will need to be careful with your schemas and how you set up the RefResolver and Validator objects.

P.S. Here is a snipped that exercises the above. Try modifying the data string to remove one of the required attributes:

import json

from jsonschema import RefResolver, Draft7Validator

base = """
{
  "$id": "base.schema.json",
  "type": "object",
  "properties": {
    "prop": {
      "type": "string"
    }
  },
  "required": ["prop"]
}
"""

extend = """
{
  "allOf": [
    {"$ref": "base.schema.json"},
    {
      "properties": {
        "extra": {
          "type": "boolean"
        }
      },
      "required": ["extra"]
    }
  ]
}
"""

data = """
{
"prop": "This is the property string",
"extra": true
}
"""

schema = json.loads(base)
extendedSchema = json.loads(extend)
resolver = RefResolver.from_schema(schema)
validator = Draft7Validator(extendedSchema, resolver=resolver)

jsonData = json.loads(data)
validator.validate(jsonData)
Ether
  • 53,118
  • 13
  • 86
  • 159
Devin P.
  • 128
  • 5
1

My approach is to preload all schema fragments to RefResolver cache. I created a gist that illustrates this: https://gist.github.com/mrtj/d59812a981da17fbaa67b7de98ac3d4b

MrTJ
  • 13,064
  • 4
  • 41
  • 63
1

This is what I used to dynamically generate a schema_store from all schemas in a given directory

base.schema.json

{
  "$id": "base.schema.json",
  "type": "object",
  "properties": {
    "prop": {
      "type": "string"
    }
  },
  "required": ["prop"]
}

extend.schema.json

{  
  "$id": "extend.schema.json",
  "allOf": [
    {"$ref": "base.schema.json"},
    {
      "properties": {
        "extra": {
          "type": "boolean"
        }
      },
    "required": ["extra"]
    }
  ]
}

instance.json

{
  "prop": "This is the property string",
  "extra": true
}

validator.py

import json

from pathlib import Path

from jsonschema import Draft7Validator, RefResolver
from jsonschema.exceptions import RefResolutionError

schemas = (json.load(open(source)) for source in Path("schema/dir").iterdir())
schema_store = {schema["$id"]: schema for schema in schemas}

schema = json.load(open("schema/dir/extend.schema.json"))
instance = json.load(open("instance/dir/instance.json"))
resolver = RefResolver.from_schema(schema, store=schema_store)
validator = Draft7Validator(schema, resolver=resolver)

try:
    errors = sorted(validator.iter_errors(instance), key=lambda e: e.path)
except RefResolutionError as e:
    print(e)
reubano
  • 5,087
  • 1
  • 42
  • 41