Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling
Questions tagged [great-expectations]
131 questions
6
votes
4 answers
Great Expectations expect column to contain only integers fails for all rows when only one is bad
I want to use the great expectations package to validate that a column in a .csv file only contains integers.
The file I am using has only integers in the age column except for one row which has a '`' character instead. This is what I want the…

Dan
- 45,079
- 17
- 88
- 157
5
votes
1 answer
Using Great Expectations with index of pandas data frame
If I have a a data frame
df = pd.DataFrame({'A': [1.1, 2.2, 3.3], 'B': [4.4, 5.5, 6.6]})
I can use Great Expectations to check the name and dtypes of the columns like so:
import great_expectations as ge
df_asset = ge.from_pandas(df)
# List of…

Elis
- 70
- 10
5
votes
0 answers
Create expectation suite without CLI
I am starting to use Great Expectations for a project. I am trying to create a expectation suite programatically with Great Expectations. I have a GCS datasource (consisting on 2 csv files) defined in great_expectations.yml as follows:
datasources:
…

Ariadna
- 51
- 1
5
votes
2 answers
Data testing framework for data streaming (deequ vs Great Expectations)
I want to introduce data quality testing (empty fields/max-min values/regex/etc...) into my pipeline which will essentially consume kafta topics testing the data before it is logged into the DB.
I am having a hard time choosing between the Deequ and…

Andy MGF
- 133
- 1
- 7
4
votes
1 answer
How can I force a partition filter to be added to a Great Expectations dataset?
I have two (web) event tables in BigQuery which are partitioned by a DATE column named _date.
One of the tables does not require the partition filter (tableA), the other does (tableB).
When I configure my Great Expectations datasource config I do…

dlamblin
- 43,965
- 20
- 101
- 140
4
votes
0 answers
Batch Request returns empty list with InferredAssetAzureDataConnector in Great Expectations
I am intending to set up an Azure Blob storage data source for great expectations. The setup is done with the following string and seems to work, given it lists some files in my blob storage.
example_yaml = f"""
name: {datasource_name}
class_name:…

DoubleSteakHouse
- 93
- 4
4
votes
1 answer
Great_Expectations Conditional Expectation in Spark 3.2.1 with Pandas API in DataBricks
We want to implement Great_Expectations' in DataBricks with Conditional Expectation. According to GE's documentation https://docs.greatexpectations.io/docs/reference/expectations/conditional_expectations is only available for Pandas this argument…

fullysane
- 51
- 1
4
votes
1 answer
Great Expectations: base_directory must be an absolute path if root_directory is not provided
This is about Great Expectations module in python primarily used for data quality checks (I found their documentation to be inadequate). So I've been trying to set up the data context on my notebook (using a local datasource) - as mentioned…

Debapratim Chakraborty
- 375
- 3
- 15
4
votes
2 answers
How do you convert a dataframe to a great_expectations dataset?
I have a pandas or pyspark dataframe df where I want to run an expectation against.
I already have my dataframe in memory. How can I convert my dataframe to a great_expectations dataset?
so that i can do for…

Vincent Claes
- 3,960
- 3
- 44
- 62
3
votes
0 answers
How to solve great expectations "MetricResolutionError: Cannot compile Column object until its 'name' is assigned." Error?
I am trying to use great expectations, The function i want to use is "expect_compound_columns_to_be_unique".
This is the code (main code - template);
import datetime
import pandas as pd
import great_expectations as ge
import…

Sevval Kahraman
- 1,185
- 3
- 10
- 37
3
votes
0 answers
How to create a Great Expectations Suite from a Pandas Profiling Report
As already stated in the title I want to generate so called 'assertions' via Great Expectation. I've done it the normal way by creating a connection to datasource. Now I want to combine it with Pandas Profiling, i.e. creating an Expectation Suite…

HenrikS
- 33
- 2
3
votes
1 answer
Check column names and column types in Great Expectations
Currently, I am validating the table schema with expect_table_columns_to_match_set by feeding in a list of columns. However, I want to validate the schema associated with each column such as string. The only available Great Expectations rule…

THIS USER NEEDS HELP
- 3,136
- 4
- 30
- 55
3
votes
0 answers
How do I disable SSL for a Great Expectations connection configuration?
I'd like to use Great Expectations to validate some of our data and data pipelines via Trino.
I'm reasonably sure I have the right configuration, which I'll paste in below. It doesn't work because (I think) it's trying to connect via SSL. For the…

josephkibe
- 1,281
- 14
- 28
3
votes
1 answer
Creating Expectation suite using Rule based profilers in Great Expectation
I have just started with Great Expectations and I am using Rule based profiler to create an expectation suite following this doc.
The document does not have any information on how and where to save the expectation suite and how can I use this…

Heether
- 152
- 1
- 1
- 6
3
votes
1 answer
How to Save Great_Expectations suite locally on Databricks (Community Edition)
I'm able to save a Great_Expectations suite to the tmp folder on my Databricks Community Edition as follows:
ge_partdf.save_expectation_suite('/tmp/myexpectation_suite.json',discard_failed_expectations=False)
But the problem is, when I restart the…

Patterson
- 1,927
- 1
- 19
- 56