I am starting to use Great Expectations for a project. I am trying to create a expectation suite programatically with Great Expectations. I have a GCS datasource (consisting on 2 csv files) defined in great_expectations.yml
as follows:
datasources:
GCS_Data:
class_name: Datasource
data_connectors:
default_inferred_data_connector_name:
class_name: InferredAssetFilesystemDataConnector
default_regex:
group_names:
- data_asset_name
pattern: (.*)
base_directory: gs://mybucket/GCS_datasource
module_name: great_expectations.datasource.data_connector
default_runtime_data_connector_name:
class_name: RuntimeDataConnector
module_name: great_expectations.datasource.data_connector
assets:
my_runtime_asset_name:
class_name: Asset
module_name: great_expectations.datasource.data_connector.asset
batch_identifiers:
- runtime_batch_identifier_name
execution_engine:
class_name: PandasExecutionEngine
module_name: great_expectations.execution_engine
module_name: great_expectations.datasource
config_variables_file_path: uncommitted/config_variables.yml
When I try to create the expectation suite I run:
import great_expectations as ge
from great_expectations.core.batch import BatchRequest
from great_expectations.checkpoint import SimpleCheckpoint #needed?
from great_expectations.exceptions import DataContextError
context = ge.data_context.DataContext()
# Note that if you modify this batch request, you may save the new version as a .json file
# to pass in later via the --batch-request option
batch_request = {
"datasource_name": "GCS_Data",
"data_connector_name": "default_inferred_data_connector_name",
"data_asset_name": "yellow_tripdata_sample_2019-01.csv",
"limit": 1000,
}
suite = context.create_expectation_suite(expectation_suite_name='my_second_expectation_suite')
validator = context.get_validator(
batch_request=BatchRequest(**batch_request),
expectation_suite_name='my_second_expectation_suite')
But the 'get_validator' step throws the following error:
---------------------------------------------------------------------------
InvalidBatchRequestError Traceback (most recent call last)
/tmp/ipykernel_27667/3237782419.py in <module>
35 validator = context.get_validator(
36 batch_request=BatchRequest(**batch_request),
---> 37 expectation_suite_name='my_second_expectation_suite')
38
39 validator.expect_column_max_to_be_between(column = 'passenger_count', min = 4, max = 10)
/opt/conda/lib/python3.7/site-packages/great_expectations/data_context/data_context/abstract_data_context.py in get_validator(self, datasource_name, data_connector_name, data_asset_name, batch, batch_list, batch_request, batch_request_list, batch_data, data_connector_query, batch_identifiers, limit, index, custom_filter_function, sampling_method, sampling_kwargs, splitter_method, splitter_kwargs, runtime_parameters, query, path, batch_filter_parameters, expectation_suite_ge_cloud_id, batch_spec_passthrough, expectation_suite_name, expectation_suite, create_expectation_suite_with_name, include_rendered_content, **kwargs)
1393 expectation_suite=expectation_suite, # type: ignore[arg-type]
1394 batch_list=batch_list,
-> 1395 include_rendered_content=include_rendered_content,
1396 )
1397
/opt/conda/lib/python3.7/site-packages/great_expectations/data_context/data_context/abstract_data_context.py in get_validator_using_batch_list(self, expectation_suite, batch_list, include_rendered_content, **kwargs)
1418 raise ge_exceptions.InvalidBatchRequestError(
1419 """Validator could not be created because BatchRequest returned an empty batch_list.
-> 1420 Please check your parameters and try again."""
1421 )
1422
InvalidBatchRequestError: Validator could not be created because BatchRequest returned an empty batch_list.
Please check your parameters and try again.
Something I don't really understand because my batch_request object it's not empty. Does somebody has any idea of what can happens?
Thanks in advance
I also have tried to follow the steps from here: https://legacy.docs.greatexpectations.io/en/stable/guides/how_to_guides/creating_and_editing_expectations/how_to_create_a_new_expectation_suite_without_the_cli.html
But in the step:
batch = context.get_batch(batch_kwargs, suite)
I also get this error:
AttributeError: 'Datasource' object has no attribute 'get_batch'