In Python 3.8, I'm trying to mock up a validation JSON schema for the structure below:
{
# some other key/value pairs
"data_checks": {
"check_name": {
"sql": "SELECT col FROM blah",
"expectations": {
"expect_column_values_to_be_unique": {
"column": "col",
},
# additional items as required
}
},
# additional items as required
}
}
The requirements I'm trying to enforce include:
- At least one item in
data_checks
that can have a dynamic name. Item keys should be unique. sql
andexpectations
keys must be presentsql
should be a text string- At least one item in
expectations
. Item keys should be unique. - Within
expectations
, item keys must be equal to available methods provided bydir(class_name)
More advanced capability would include:
- Enforcing
expectations
method items to only includekwargs
for that method
I currently have the following JSON schema for the data_checks
portion:
"data_checks": {
"description": "Data quality checks against provided sources.",
"minProperties": 1,
"type": "object",
"patternProperties": {
".+": {
"required": ["expectations", "sql"],
"sql": {
"description": "SQL for data quality check.",
"minLength": 1,
"type": "string",
},
"expectations": {
"description": "Great Expectations function name.",
"minProperties": 1,
"type": "object",
"anyOf": [
{
"type": "string",
"minLength": 1,
"pattern": [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")],
}
],
},
},
},
},
This JSON schema does not enforce expectations
to have at least one item nor does it enforce valid method names for the nested keys as expected from [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")]
. I haven't really looked into enforcing kwargs
for the selected method (is that even possible?).
I don't know if this is related to things being nested, but how would I enforce the proper validation requirements?
Thanks!