3

As already stated in the title I want to generate so called 'assertions' via Great Expectation. I've done it the normal way by creating a connection to datasource. Now I want to combine it with Pandas Profiling, i.e. creating an Expectation Suite based on a Profiling Report. According to the documentation it should look something like this. However, it does not work as you can see in the error below.

import great_expectations as ge
import pandas as pd

from pandas_profiling import ProfileReport
import os

p = os.getcwd()
p += "\data\cars.csv"

df = pd.read_csv(p)

profile = ProfileReport(df, title="Pandas Profiling Report", explorative=True)

# Example 1
# Obtain expectation suite, this includes profiling the dataset, saving the expectation suite, validating the
# dataframe, and building data docs
suite = profile.to_expectation_suite(suite_name="cars_expectations")

That throws following error:

Summarize dataset: 100%
81/81 [00:37<00:00, 3.01it/s, Completed]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\great_expectations\data_context\data_context\base_data_context.py in run_validation_operator(self, validation_operator_name, assets_to_validate, run_id, evaluation_parameters, run_name, run_time, result_format, **kwargs)
    510         try:
--> 511             validation_operator = self.validation_operators[validation_operator_name]
    512         except KeyError:

KeyError: 'action_list_operator'

During handling of the above exception, another exception occurred:

DataContextError                          Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_4484/2792258824.py in <module>
     16 # Obtain expectation suite, this includes profiling the dataset, saving the expectation suite, validating the
     17 # dataframe, and building data docs
---> 18 suite = profile.to_expectation_suite(suite_name="cars_expectations")

C:\ProgramData\Anaconda3\lib\site-packages\pandas_profiling\expectations_report.py in to_expectation_suite(self, suite_name, data_context, save_suite, run_validation, build_data_docs, handler)
    101             batch = ge.dataset.PandasDataset(self.df, expectation_suite=suite)
    102 
--> 103             results = data_context.run_validation_operator(
    104                 "action_list_operator", assets_to_validate=[batch]
    105             )

C:\ProgramData\Anaconda3\lib\site-packages\great_expectations\core\usage_statistics\usage_statistics.py in usage_statistics_wrapped_method(*args, **kwargs)
    302                     nested_update(event_payload, args_payload_fn(*args, **kwargs))
    303 
--> 304                 result = func(*args, **kwargs)
    305                 message["success"] = True
    306             except Exception:

C:\ProgramData\Anaconda3\lib\site-packages\great_expectations\data_context\data_context\base_data_context.py in run_validation_operator(self, validation_operator_name, assets_to_validate, run_id, evaluation_parameters, run_name, run_time, result_format, **kwargs)
    511             validation_operator = self.validation_operators[validation_operator_name]
    512         except KeyError:
--> 513             raise ge_exceptions.DataContextError(
    514                 f"No validation operator `{validation_operator_name}` was found in your project. Please verify this in your great_expectations.yml"
    515             )

DataContextError: No validation operator `action_list_operator` was found in your project. Please verify this in your great_expectations.yml

I am using: Pandas-Profiling 3.4.0, Great Expectations 0.15.32

Thanks for your help in advance.

HenrikS
  • 33
  • 2

0 Answers0