5

I am starting to use Great Expectations for a project. I am trying to create a expectation suite programatically with Great Expectations. I have a GCS datasource (consisting on 2 csv files) defined in great_expectations.yml as follows:

datasources:
  GCS_Data:
    class_name: Datasource
    data_connectors:
      default_inferred_data_connector_name:
        class_name: InferredAssetFilesystemDataConnector
        default_regex:
          group_names:
            - data_asset_name
          pattern: (.*)
        base_directory: gs://mybucket/GCS_datasource
        module_name: great_expectations.datasource.data_connector
      default_runtime_data_connector_name:
        class_name: RuntimeDataConnector
        module_name: great_expectations.datasource.data_connector
        assets:
          my_runtime_asset_name:
            class_name: Asset
            module_name: great_expectations.datasource.data_connector.asset
            batch_identifiers:
              - runtime_batch_identifier_name
    execution_engine:
      class_name: PandasExecutionEngine
      module_name: great_expectations.execution_engine
    module_name: great_expectations.datasource
config_variables_file_path: uncommitted/config_variables.yml

When I try to create the expectation suite I run:

   
import great_expectations as ge
from great_expectations.core.batch import BatchRequest
from great_expectations.checkpoint import SimpleCheckpoint #needed?
from great_expectations.exceptions import DataContextError

context = ge.data_context.DataContext()

# Note that if you modify this batch request, you may save the new version as a .json file
#  to pass in later via the --batch-request option
batch_request = {
    "datasource_name": "GCS_Data",
    "data_connector_name": "default_inferred_data_connector_name",
    "data_asset_name": "yellow_tripdata_sample_2019-01.csv",
    "limit": 1000,
}


suite = context.create_expectation_suite(expectation_suite_name='my_second_expectation_suite')

validator = context.get_validator(
    batch_request=BatchRequest(**batch_request),
    expectation_suite_name='my_second_expectation_suite')

But the 'get_validator' step throws the following error:

---------------------------------------------------------------------------
InvalidBatchRequestError                  Traceback (most recent call last)
/tmp/ipykernel_27667/3237782419.py in <module>
     35 validator = context.get_validator(
     36     batch_request=BatchRequest(**batch_request),
---> 37     expectation_suite_name='my_second_expectation_suite')
     38 
     39 validator.expect_column_max_to_be_between(column = 'passenger_count', min = 4, max = 10)

/opt/conda/lib/python3.7/site-packages/great_expectations/data_context/data_context/abstract_data_context.py in get_validator(self, datasource_name, data_connector_name, data_asset_name, batch, batch_list, batch_request, batch_request_list, batch_data, data_connector_query, batch_identifiers, limit, index, custom_filter_function, sampling_method, sampling_kwargs, splitter_method, splitter_kwargs, runtime_parameters, query, path, batch_filter_parameters, expectation_suite_ge_cloud_id, batch_spec_passthrough, expectation_suite_name, expectation_suite, create_expectation_suite_with_name, include_rendered_content, **kwargs)
   1393             expectation_suite=expectation_suite,  # type: ignore[arg-type]
   1394             batch_list=batch_list,
-> 1395             include_rendered_content=include_rendered_content,
   1396         )
   1397 

/opt/conda/lib/python3.7/site-packages/great_expectations/data_context/data_context/abstract_data_context.py in get_validator_using_batch_list(self, expectation_suite, batch_list, include_rendered_content, **kwargs)
   1418             raise ge_exceptions.InvalidBatchRequestError(
   1419                 """Validator could not be created because BatchRequest returned an empty batch_list.
-> 1420                 Please check your parameters and try again."""
   1421             )
   1422 

InvalidBatchRequestError: Validator could not be created because BatchRequest returned an empty batch_list.
                Please check your parameters and try again.

Something I don't really understand because my batch_request object it's not empty. Does somebody has any idea of what can happens?

Thanks in advance

I also have tried to follow the steps from here: https://legacy.docs.greatexpectations.io/en/stable/guides/how_to_guides/creating_and_editing_expectations/how_to_create_a_new_expectation_suite_without_the_cli.html

But in the step:

batch = context.get_batch(batch_kwargs, suite)

I also get this error:

AttributeError: 'Datasource' object has no attribute 'get_batch'
Ariadna
  • 51
  • 1

0 Answers0