3

I have just started with Great Expectations and I am using Rule based profiler to create an expectation suite following this doc.

The document does not have any information on how and where to save the expectation suite and how can I use this suite to validate the data that I get to see in future. can you please help me with some inputs. Appreciate your inputs

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Heether
  • 152
  • 1
  • 1
  • 6

1 Answers1

0

I will assume that all steps until the result of the RuleBased Profiler are clear and start from this point on (doc):

result: RuleBasedProfilerResult = rule_based_profiler.run(batch_request=batch_request)

From the result you can extract the created expectation_configurations:

expectation_configurations: List[ExpectationConfiguration] = result.expectation_configurations

When you have your list of expectation_configurations, you can add them to a suite. Suites can be created/loaded like this:

from typing import List
from ruamel import yaml

from great_expectations import DataContext
from great_expectations.core import ExpectationConfiguration
from great_expectations.rule_based_profiler import RuleBasedProfilerResult
from great_expectations.core.batch import BatchRequest
from great_expectations.rule_based_profiler.rule_based_profiler import RuleBasedProfiler
from great_expectations.checkpoint import SimpleCheckpoint
from great_expectations.data_context.types.resource_identifiers import ExpectationSuiteIdentifier
from great_expectations.exceptions import DataContextError

context = DataContext()

expectation_suite_name = 'my_suite'

try:
    suite = context.get_expectation_suite(expectation_suite_name=expectation_suite_name)
    print(f'Loaded ExpectationSuite "{suite.expectation_suite_name}" containing {len(suite.expectations)} expectations.')
except DataContextError:
    suite = context.add_expectation_suite(expectation_suite_name=expectation_suite_name)
    print(f'Created ExpectationSuite "{suite.expectation_suite_name}".')

When you have a fitting suite, you can add the configuration_expectations like this:

for expectation_configuration in expectation_configurations:
    suite.add_expectation(expectation_configuration=expectation_configuration)

Update your context so the added expectations are available:

context.add_or_update_expectation_suite(expectation_suite=suite)

With this, the expectations are now available in your suite. If you want to use a validator to test your expectations it can now be done like with every other suite you created by utilizing a checkpoint:

# fill in for your specific datasource
batch_request = {
'datasource_name': 'my_datasoure',
'data_connector_name': 'default_inferred_data_connector_name', 'data_asset_name': 'my_data_asset_name', 'limit': 1000
}

# get the validator from the context
validator = context.get_validator(
    batch_request=BatchRequest(**batch_request),
    expectation_suite_name=expectation_suite_name
)

print(validator.get_expectation_suite(discard_failed_expectations=False))
validator.save_expectation_suite(discard_failed_expectations=False)

# configure a checkpoint
checkpoint_config = {
    "class_name": "SimpleCheckpoint",
    "validations": [
        {
            "batch_request": batch_request,
            "expectation_suite_name": expectation_suite_name
        }
    ]
}
checkpoint = SimpleCheckpoint(
    f"{validator.active_batch_definition.data_asset_name}_{expectation_suite_name}",
    context,
    **checkpoint_config
)
checkpoint_result = checkpoint.run()

context.build_data_docs()

validation_result_identifier = checkpoint_result.list_validation_result_identifiers()[0]
context.open_data_docs(resource_identifier=validation_result_identifier)
REH
  • 1