Questions tagged [amazon-deequ]
57 questions
8
votes
1 answer
Uniqueness check in Deequ
I'm currently exploring Deequ library and I'm trying to understand whether it's possible to check for the uniqueness of a combination of column.
This code
.hasUniqueness(Seq("col1", "col2"), Check.IsOne))
seems to calculate uniqueness for each…

Dawid
- 652
- 1
- 11
- 24
5
votes
2 answers
Data testing framework for data streaming (deequ vs Great Expectations)
I want to introduce data quality testing (empty fields/max-min values/regex/etc...) into my pipeline which will essentially consume kafta topics testing the data before it is logged into the DB.
I am having a hard time choosing between the Deequ and…

Andy MGF
- 133
- 1
- 7
3
votes
0 answers
Pydeequ - Datatype check for datetime column
We are implementing Pydeequ process for doing certain validations including data type check (using verification suite).
However based on the understanding, the hasDataType fucntion can be used only for validating against the following - any thoughts…

Murugan S
- 47
- 4
3
votes
2 answers
how to run all suggested checks in pydeequ
I have just started with pydeequ and I want to create checks for spark dataframe that has ~1800 features. Now to know which checks I must perform, I do the following
suggestionResult = ConstraintSuggestionRunner(spark) \
.onData(df) \
…

Shoaibkhanz
- 1,942
- 3
- 24
- 41
2
votes
0 answers
deequ rule to check for gap in numeric sequence
With deequ is there a way of checking that a sequence has no gap in it? Similar to this sql

oluies
- 17,694
- 14
- 74
- 117
2
votes
1 answer
How to Store Failed Status Records of Amazon Deequ in a Separate Spark DataFrame
I have a requirement to run Data Quality Test So I am using Amazon Deequ for this.
I am able to find the Data Quality Success/Failure Status using below code, but next I want to get all the rows which was failed in check and Store into another…

Anu Shivangi
- 45
- 5
2
votes
1 answer
How to check if values of a DateType column are within a range of specified dates?
So, I'm using Amazon Deequ in Spark, and I have a dataframe df with a column publish_date which is of type DateType. I simply want to check the following:
publish_date <= current_date(minus)x AND publish_date >= current_date(minus)y
where x and y…

Debapratim Chakraborty
- 375
- 3
- 15
2
votes
0 answers
Parsing Deequ Rules from a csv/table dynamically
I'm using amazon deequ library and trying to pass the rules from a csv or a mysql table.
My csv file will have the column with values like…

Riyan Mohammed
- 247
- 2
- 6
- 20
2
votes
2 answers
Load constraints from csv-file (amazon deequ)
I'm checking out Deequ which seems like a really nice library. I was wondering if it is possible to load constraints from a csv file or an orc-table in HDFS?
Lets say I have a table with theese types
case class Item(
id: Long,
productName:…

Afshin Yavari
- 31
- 3
1
vote
1 answer
Pydeequ throwing Py4JJavaError
I have the following installation of Pydeequ:
In an anaconda environment, I have installed pyspark 3.0.0, pydeequ last release and sagemaker_pyspark last release.
from pyspark.sql import SparkSession
import os
os.environ["SPARK_VERSION"] =…

Norhther
- 545
- 3
- 15
- 35
1
vote
1 answer
closing pydeequ callback server
I'm using pydeequ with Spark 3.0.1 to perform some constraint checks on data.
As for testing with the VerificationSuite, after calling VerificationResult.checkResultsAsDataFrame(spark, result), it seems that the callback server which gets started by…

dataviews
- 2,466
- 7
- 31
- 64
1
vote
0 answers
Data Quality Framework in AWS
I am trying to implement a data quality framework for an application which ingests data from various systems(batch, near real time, real time). Few items that I want to highlight here are:
The data pipelines widely vary and ingest very high volumes…

jtp
- 67
- 5
1
vote
1 answer
PyDeequ hasPattern fails with 'PatternMatch' object has no attribute '_Check'
I'm trying to run the sample code for pattern check "hasPattern()" with PyDeequ and it fails with Exception
The code:
import pydeequ
from pyspark.sql import SparkSession, Row
spark = (SparkSession
.builder
…

Gleb Kalinin
- 21
- 4
1
vote
0 answers
AWS Deequ Checks Error: isGreaterThanOrEqualTo is not a member of com.amazon.deequ.VerificationRunBuilder
I run the following command on Databricks Notebook with com.amazon.deequ:deequ:2.0.0-spark-3.1 library for checking data quality on input data, and I got error messages on certain functions a member of com.amazon.deequ.VerificationRunBuilder. Where…

fullysane
- 51
- 1
1
vote
1 answer
Type arguments do not conform to trait type parameter bounds
I am using a library which is written by amazon in scala here
The trait goes like this :
trait Analyzer[S <: State[_], +M <: Metric[_]]
I am trying to make a case object to store some information and an instace of the above Analyzer is a part of…

Shiv
- 105
- 7