52

I have a dictionary with config info:

my_conf = {
    'version': 1,

    'info': {
        'conf_one': 2.5,
        'conf_two': 'foo',
        'conf_three': False,
        'optional_conf': 'bar'
    }
}

I want to check if the dictionary follows the structure I need.

I'm looking for something like this:

conf_structure = {
    'version': int,

    'info': {
        'conf_one': float,
        'conf_two': str,
        'conf_three': bool
    }
}

is_ok = check_structure(conf_structure, my_conf)

Is there any solution done to this problem or any library that could make implementing check_structure more easy?

Danil Speransky
  • 29,891
  • 5
  • 68
  • 79
Thyrst'
  • 2,253
  • 2
  • 22
  • 27

10 Answers10

70

You may use schema (PyPi Link)

schema is a library for validating Python data structures, such as those obtained from config-files, forms, external services or command-line parsing, converted from JSON/YAML (or something else) to Python data-types.

from schema import Schema, And, Use, Optional, SchemaError

def check(conf_schema, conf):
    try:
        conf_schema.validate(conf)
        return True
    except SchemaError:
        return False

conf_schema = Schema({
    'version': And(Use(int)),
    'info': {
        'conf_one': And(Use(float)),
        'conf_two': And(Use(str)),
        'conf_three': And(Use(bool)),
        Optional('optional_conf'): And(Use(str))
    }
})

conf = {
    'version': 1,
    'info': {
        'conf_one': 2.5,
        'conf_two': 'foo',
        'conf_three': False,
        'optional_conf': 'bar'
    }
}

print(check(conf_schema, conf))
Danil Speransky
  • 29,891
  • 5
  • 68
  • 79
  • Looks great! Thanks :) – Thyrst' Aug 22 '17 at 08:25
  • This is a copy paste from the docs. How exactly would this help OP? Can you provide a concrete example showing how? This isn't much better than a link-only answer as it stands. – cs95 Aug 22 '17 at 08:29
  • Hello. a query! What if one had a list? For example if the 'info' filed had a field 'changes' which is a list and can have 0 or more elements? How would the schema look like? – pa1 Mar 12 '20 at 10:18
  • A caveat: `conf_schema.validate(conf)` won't throw an error if it manages to convert a given type to the correct one! (The expression would return a corrected version of a given dict.) – m_ocean Aug 10 '21 at 11:53
25

Advice for the future: use Pydantic!

Pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid. Define how data should be in pure, canonical python; validate it with pydantic, as simple as that:

from pydantic import BaseModel


class Info(BaseModel):
    conf_one: float
    conf_two: str
    conf_three: bool

    class Config:
        extra = 'forbid'


class ConfStructure(BaseModel):
    version: int
    info: Info

If validation fails pydantic will raise an error with a breakdown of what was wrong:

my_conf_wrong = {
    'version': 1,

    'info': {
        'conf_one': 2.5,
        'conf_two': 'foo',
        'conf_three': False,
        'optional_conf': 'bar'
    }
}

my_conf_right = {
    'version': 10,

    'info': {
        'conf_one': 14.5,
        'conf_two': 'something',
        'conf_three': False
    }
}

model = ConfStructure(**my_conf_right)
print(model.dict())
# {'version': 10, 'info': {'conf_one': 14.5, 'conf_two': 'something', 'conf_three': False}}

res = ConfStructure(**my_conf_wrong)
# pydantic.error_wrappers.ValidationError: 1 validation error for ConfStructure
#     info -> optional_conf
# extra fields not permitted (type=value_error.extra)
funnydman
  • 9,083
  • 4
  • 40
  • 55
23

Without using libraries, you could also define a simple recursive function like this:

def check_structure(struct, conf):
    if isinstance(struct, dict) and isinstance(conf, dict):
        # struct is a dict of types or other dicts
        return all(k in conf and check_structure(struct[k], conf[k]) for k in struct)
    if isinstance(struct, list) and isinstance(conf, list):
        # struct is list in the form [type or dict]
        return all(check_structure(struct[0], c) for c in conf)
    elif isinstance(conf, type):
        # struct is the type of conf
        return isinstance(struct, conf)
    else:
        # struct is neither a dict, nor list, not type
        return False

This assumes that the config can have keys that are not in your structure, as in your example.


Update: New version also supports lists, e.g. like 'foo': [{'bar': int}]

kinton
  • 168
  • 2
  • 13
tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • 1
    I think this line `elif isinstance(struct, type):` should be `isinstance(conf, type):` instead – Bouni Nov 07 '19 at 08:26
  • Great answer. Note that you can validate `None` type as well, but in your structure you have to do `'version': type(None)`, unlike the other types (`int`, `str`, etc.) which you can write directly. – supermitch Mar 06 '23 at 20:06
2

You can build structure using recursion:

def get_type(value):
    if isinstance(value, dict):
        return {key: get_type(value[key]) for key in value}
    else:
        return str(type(value))

And then compare required structure with your dictionary:

get_type(current_conf) == get_type(required_conf)

Example:

required_conf = {
    'version': 1,
    'info': {
        'conf_one': 2.5,
        'conf_two': 'foo',
        'conf_three': False,
        'optional_conf': 'bar'
    }
}

get_type(required_conf)

{'info': {'conf_two': "<type 'str'>", 'conf_one': "<type 'float'>", 'optional_conf': "<type 'str'>", 'conf_three': "<type 'bool'>"}, 'version': "<type 'int'>"}
Eugene Soldatov
  • 9,755
  • 2
  • 35
  • 43
2

Looks like the dict-schema-validator package does exactly what you need:

Here is a simple schema representing a Customer:

{
  "_id":          "ObjectId",
  "created":      "date",
  "is_active":    "bool",
  "fullname":     "string",
  "age":          ["int", "null"],
  "contact": {
    "phone":      "string",
    "email":      "string"
  },
  "cards": [{
    "type":       "string",
    "expires":    "date"
  }]
}

Validation:

from datetime import datetime
import json
from dict_schema_validator import validator


with open('models/customer.json', 'r') as j:
    schema = json.loads(j.read())

customer = {
    "_id":          123,
    "created":      datetime.now(),
    "is_active":    True,
    "fullname":     "Jorge York",
    "age":          32,
    "contact": {
        "phone":    "559-940-1435",
        "email":    "york@example.com",
        "skype":    "j.york123"
    },
    "cards": [
        {"type": "visa", "expires": "12/2029"},
        {"type": "visa"},
    ]
}

errors = validator.validate(schema, customer)
for err in errors:
    print(err['msg'])

Output:

[*] "_id" has wrong type. Expected: "ObjectId", found: "int"
[+] Extra field: "contact.skype" having type: "str"
[*] "cards[0].expires" has wrong type. Expected: "date", found: "str"
[-] Missing field: "cards[1].expires"
Jean DuPont
  • 411
  • 7
  • 22
2

There is a standard for validating JSON files called JSON Schema.

Validators have been implemented in many languages, including the Python. Read also the documentation for more details. In the following example I will use a Python package jsonschema (docs) that I am familiar with.


Given the config data

my_conf = {
    'version': 1,
    'info': {
        'conf_one': 2.5,
        'conf_two': 'foo',
        'conf_three': False,
        'optional_conf': 'bar',
    },
}

and the corresponding config schema

conf_structure = {
    'type': 'object',
    'properties': {
        'version': {'type': 'integer'},
        'info': {
            'type': 'object',
            'properties': {
                'conf_one': {'type': 'number'},
                'conf_two': {'type': 'string'},
                'conf_three': {'type': 'boolean'},
                'optional_conf': {'type': 'string'},
            },
            'required': ['conf_one', 'conf_two', 'conf_three'],
        },
    },
}

the actual code to validate this data is then as simple as this:

import jsonschema

jsonschema.validate(my_conf, schema=conf_structure)

A big advantage of this approach is that you can store both data and schema as JSON-formatted files.

Jeyekomon
  • 2,878
  • 2
  • 27
  • 37
1

You can also use dataclasses_json library. Here is how I would normally do it

from dataclasses import dataclass
from dataclasses_json import dataclass_json, Undefined
from dataclasses_json.undefined import UndefinedParameterError
from typing import Optional


#### define schema #######
@dataclass_json(undefined=Undefined.RAISE)
@dataclass
class Info:
  conf_one: float
  # conf_two: str
  conf_three: bool
  optional_conf: Optional[str]

@dataclass_json
@dataclass
class ConfStructure:
  version: int
  info: Info

####### test for compliance####
try:
  ConfStructure.from_dict(my_conf).to_dict()
except KeyError as e:
  print('theres a missing parameter')
except UndefinedParameterError as e:
  print('extra parameters')


Nic Wanavit
  • 2,363
  • 5
  • 19
  • 31
1

You can use dictify from https://pypi.org/project/dictify/.

Read docs here https://dictify.readthedocs.io/en/latest/index.html

This is how it can be done.

from dictify import Field, Model

class Info(Model):
    conf_one = Field(required=True).instance(float)
    conf_two = Field(required=True).instance(str)
    conf_three = Field(required=True).instance(bool)
    optional_conf = Field().instance(str)

class MyConf(Model):
    version = Field(required=True).instance(int)
    info = Field().model(Info)

my_conf = MyConf() # Invalid without required fields

# Valid
my_conf = MyConf({
    'version': 1,
    'info': {
        'conf_one': 2.5,
        'conf_two': 'foo',
        'conf_three': False,
        'optional_conf': 'bar'
    }
})

my_conf['info']['conf_one'] = 'hi' # Invalid, won't be assinged
nitipit
  • 31
  • 5
0

@tobias_k beat me to it (both in time and quality probably) but here is another recursive function for the task that might be a bit easier for you (and me) to follow:

def check_dict(my_dict, check_against):
    for k, v in check_against.items():
        if isinstance(v, dict):
            return check_dict(my_dict[k], v)
        else:
            if not isinstance(my_dict[k], v):
                return False
    return True
Ma0
  • 15,057
  • 4
  • 35
  • 65
0

The nature of dictionaries, if they are being used in python and not exported as some JSON, is that the order of the dictionary need not be set. Instead, looking up keys returns values (hence a dictionary).

In either case, these functions should provide you with what your looking for for the level of nesting present in the samples you provided.

#assuming identical order of keys is required

def check_structure(conf_structure,my_conf):
    if my_conf.keys() != conf_structure.keys():
        return False

    for key in my_conf.keys():
        if type(my_conf[key]) == dict:
            if my_conf[key].keys() != conf_structure[key].keys():
                return False

    return True

#assuming identical order of keys is not required

def check_structure(conf_structure,my_conf):
    if sorted(my_conf.keys()) != sorted(conf_structure.keys()):
        return False

    for key in my_conf.keys():
        if type(my_conf[key]) != dict:
            return False
        else:
            if sorted(my_conf[key].keys()) != sorted(conf_structure[key].keys()):
                return False

    return True

This solution would obviously need to be changed if the level of nesting was greater (i.e. it is configured to assess the similarity in structure of dictionaries that have some values as dictionaries, but not dictionaries where some values these latter dictionaries are also dictionaries).

Will
  • 339
  • 3
  • 7