As already stated in the title I want to generate so called 'assertions' via Great Expectation. I've done it the normal way by creating a connection to datasource. Now I want to combine it with Pandas Profiling, i.e. creating an Expectation Suite based on a Profiling Report. According to the documentation it should look something like this. However, it does not work as you can see in the error below.
import great_expectations as ge
import pandas as pd
from pandas_profiling import ProfileReport
import os
p = os.getcwd()
p += "\data\cars.csv"
df = pd.read_csv(p)
profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)
# Example 1
# Obtain expectation suite, this includes profiling the dataset, saving the expectation suite, validating the
# dataframe, and building data docs
suite = profile.to_expectation_suite(suite_name="cars_expectations")
That throws following error:
Summarize dataset: 100%
81/81 [00:37<00:00, 3.01it/s, Completed]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\great_expectations\data_context\data_context\base_data_context.py in run_validation_operator(self, validation_operator_name, assets_to_validate, run_id, evaluation_parameters, run_name, run_time, result_format, **kwargs)
510 try:
--> 511 validation_operator = self.validation_operators[validation_operator_name]
512 except KeyError:
KeyError: 'action_list_operator'
During handling of the above exception, another exception occurred:
DataContextError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4484/2792258824.py in <module>
16 # Obtain expectation suite, this includes profiling the dataset, saving the expectation suite, validating the
17 # dataframe, and building data docs
---> 18 suite = profile.to_expectation_suite(suite_name="cars_expectations")
C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\expectations_report.py in to_expectation_suite(self, suite_name, data_context, save_suite, run_validation, build_data_docs, handler)
101 batch = ge.dataset.PandasDataset(self.df, expectation_suite=suite)
102
--> 103 results = data_context.run_validation_operator(
104 "action_list_operator", assets_to_validate=[batch]
105 )
C:\ProgramData\Anaconda3\lib\site-packages\great_expectations\core\usage_statistics\usage_statistics.py in usage_statistics_wrapped_method(*args, **kwargs)
302 nested_update(event_payload, args_payload_fn(*args, **kwargs))
303
--> 304 result = func(*args, **kwargs)
305 message["success"] = True
306 except Exception:
C:\ProgramData\Anaconda3\lib\site-packages\great_expectations\data_context\data_context\base_data_context.py in run_validation_operator(self, validation_operator_name, assets_to_validate, run_id, evaluation_parameters, run_name, run_time, result_format, **kwargs)
511 validation_operator = self.validation_operators[validation_operator_name]
512 except KeyError:
--> 513 raise ge_exceptions.DataContextError(
514 f"No validation operator `{validation_operator_name}` was found in your project. Please verify this in your great_expectations.yml"
515 )
DataContextError: No validation operator `action_list_operator` was found in your project. Please verify this in your great_expectations.yml
I am using: Pandas-Profiling 3.4.0, Great Expectations 0.15.32
Thanks for your help in advance.