0

In Python 3.8, I'm trying to mock up a validation JSON schema for the structure below:

{
    # some other key/value pairs
    "data_checks": {
        "check_name": {
            "sql": "SELECT col FROM blah",
            "expectations": {
                "expect_column_values_to_be_unique": {
                    "column": "col",
                },
                # additional items as required
            }
        },
        # additional items as required
    }
}

The requirements I'm trying to enforce include:

  • At least one item in data_checks that can have a dynamic name. Item keys should be unique.
  • sql and expectations keys must be present
  • sql should be a text string
  • At least one item in expectations. Item keys should be unique.
  • Within expectations, item keys must be equal to available methods provided by dir(class_name)

More advanced capability would include:

  • Enforcing expectations method items to only include kwargs for that method

I currently have the following JSON schema for the data_checks portion:

"data_checks": {
    "description": "Data quality checks against provided sources.",
    "minProperties": 1,
    "type": "object",
    "patternProperties": {
        ".+": {
            "required": ["expectations", "sql"],
            "sql": {
                "description": "SQL for data quality check.",
                "minLength": 1,
                "type": "string",
            },
            "expectations": {
                "description": "Great Expectations function name.",
                "minProperties": 1,
                "type": "object",
                "anyOf": [
                    {
                        "type": "string",
                        "minLength": 1,
                        "pattern": [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")],
                    }
                ],
            },
        },
    },
},

This JSON schema does not enforce expectations to have at least one item nor does it enforce valid method names for the nested keys as expected from [e for e in dir(SqlAlchemyDataset) if e.startswith("expect_")]. I haven't really looked into enforcing kwargs for the selected method (is that even possible?).

I don't know if this is related to things being nested, but how would I enforce the proper validation requirements?

Thanks!

alpacafondue
  • 353
  • 3
  • 16
  • It would help if you can post some sort of an attempt so we can take it ahead from there. You should be able to do the unique key checks just by `json.loads` since duplicate keys are invalid JSON, unless you want a pure string parsing JSON schema validator. – Diptangsu Goswami Apr 15 '22 at 20:34
  • JSON schema doesn't seem to be catching things as expected, so I don't know what to share. For example, if I put a method key name I know doesn't exist for a class, `jsonschema.validate` doesn't complain and the error is raised from the class object itself. – alpacafondue Apr 15 '22 at 20:45
  • "duplicate keys are invalid JSON" - That's simply not true. See https://stackoverflow.com/a/23195243/89211 - It's not good or advisable, but possible. json.loads will not throw an error. – Relequestual Apr 20 '22 at 13:40
  • It would be helpful if you could update your question with your current situaiton and problem. – Relequestual Apr 20 '22 at 13:41

0 Answers0