Questions tagged [great-expectations]

Great Expectations is an open source software that helps teams promote analytic integrity by offering a unique approach to data pipeline testing. Pipeline tests are applied to data (instead of code) and at batch time (instead of compile or deploy time). Pipeline tests are like unit tests for datasets: they help you guard against upstream data changes and monitor data quality. In addition to pipeline testing GE also provides data documentation/profiling

131 questions
6
votes
4 answers

Great Expectations expect column to contain only integers fails for all rows when only one is bad

I want to use the great expectations package to validate that a column in a .csv file only contains integers. The file I am using has only integers in the age column except for one row which has a '`' character instead. This is what I want the…
Dan
  • 45,079
  • 17
  • 88
  • 157
5
votes
1 answer

Using Great Expectations with index of pandas data frame

If I have a a data frame df = pd.DataFrame({'A': [1.1, 2.2, 3.3], 'B': [4.4, 5.5, 6.6]}) I can use Great Expectations to check the name and dtypes of the columns like so: import great_expectations as ge df_asset = ge.from_pandas(df) # List of…
Elis
  • 70
  • 10
5
votes
0 answers

Create expectation suite without CLI

I am starting to use Great Expectations for a project. I am trying to create a expectation suite programatically with Great Expectations. I have a GCS datasource (consisting on 2 csv files) defined in great_expectations.yml as follows: datasources: …
Ariadna
  • 51
  • 1
5
votes
2 answers

Data testing framework for data streaming (deequ vs Great Expectations)

I want to introduce data quality testing (empty fields/max-min values/regex/etc...) into my pipeline which will essentially consume kafta topics testing the data before it is logged into the DB. I am having a hard time choosing between the Deequ and…
Andy MGF
  • 133
  • 1
  • 7
4
votes
1 answer

How can I force a partition filter to be added to a Great Expectations dataset?

I have two (web) event tables in BigQuery which are partitioned by a DATE column named _date. One of the tables does not require the partition filter (tableA), the other does (tableB). When I configure my Great Expectations datasource config I do…
dlamblin
  • 43,965
  • 20
  • 101
  • 140
4
votes
0 answers

Batch Request returns empty list with InferredAssetAzureDataConnector in Great Expectations

I am intending to set up an Azure Blob storage data source for great expectations. The setup is done with the following string and seems to work, given it lists some files in my blob storage. example_yaml = f""" name: {datasource_name} class_name:…
4
votes
1 answer

Great_Expectations Conditional Expectation in Spark 3.2.1 with Pandas API in DataBricks

We want to implement Great_Expectations' in DataBricks with Conditional Expectation. According to GE's documentation https://docs.greatexpectations.io/docs/reference/expectations/conditional_expectations is only available for Pandas this argument…
4
votes
1 answer

Great Expectations: base_directory must be an absolute path if root_directory is not provided

This is about Great Expectations module in python primarily used for data quality checks (I found their documentation to be inadequate). So I've been trying to set up the data context on my notebook (using a local datasource) - as mentioned…
4
votes
2 answers

How do you convert a dataframe to a great_expectations dataset?

I have a pandas or pyspark dataframe df where I want to run an expectation against. I already have my dataframe in memory. How can I convert my dataframe to a great_expectations dataset? so that i can do for…
Vincent Claes
  • 3,960
  • 3
  • 44
  • 62
3
votes
0 answers

How to solve great expectations "MetricResolutionError: Cannot compile Column object until its 'name' is assigned." Error?

I am trying to use great expectations, The function i want to use is "expect_compound_columns_to_be_unique". This is the code (main code - template); import datetime import pandas as pd import great_expectations as ge import…
Sevval Kahraman
  • 1,185
  • 3
  • 10
  • 37
3
votes
0 answers

How to create a Great Expectations Suite from a Pandas Profiling Report

As already stated in the title I want to generate so called 'assertions' via Great Expectation. I've done it the normal way by creating a connection to datasource. Now I want to combine it with Pandas Profiling, i.e. creating an Expectation Suite…
HenrikS
  • 33
  • 2
3
votes
1 answer

Check column names and column types in Great Expectations

Currently, I am validating the table schema with expect_table_columns_to_match_set by feeding in a list of columns. However, I want to validate the schema associated with each column such as string. The only available Great Expectations rule…
THIS USER NEEDS HELP
  • 3,136
  • 4
  • 30
  • 55
3
votes
0 answers

How do I disable SSL for a Great Expectations connection configuration?

I'd like to use Great Expectations to validate some of our data and data pipelines via Trino. I'm reasonably sure I have the right configuration, which I'll paste in below. It doesn't work because (I think) it's trying to connect via SSL. For the…
josephkibe
  • 1,281
  • 14
  • 28
3
votes
1 answer

Creating Expectation suite using Rule based profilers in Great Expectation

I have just started with Great Expectations and I am using Rule based profiler to create an expectation suite following this doc. The document does not have any information on how and where to save the expectation suite and how can I use this…
Heether
  • 152
  • 1
  • 1
  • 6
3
votes
1 answer

How to Save Great_Expectations suite locally on Databricks (Community Edition)

I'm able to save a Great_Expectations suite to the tmp folder on my Databricks Community Edition as follows: ge_partdf.save_expectation_suite('/tmp/myexpectation_suite.json',discard_failed_expectations=False) But the problem is, when I restart the…
1
2 3
8 9