PyDeequ is a Python API for Deequ, a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Questions tagged [pydeequ]
9 questions
2
votes
1 answer
Error importing PyDeequ package on databricks
I want to do some tests regarding data quality and for that I pretend to use PyDeequ on a databricks notebook. Keep in mind that I'm very new to databricks and Spark.
First, I created a cluster with the Runtime version "10.4 LTS (includes Apache…

Humberto Santos
- 21
- 3
1
vote
1 answer
Pydeequ throwing Py4JJavaError
I have the following installation of Pydeequ:
In an anaconda environment, I have installed pyspark 3.0.0, pydeequ last release and sagemaker_pyspark last release.
from pyspark.sql import SparkSession
import os
os.environ["SPARK_VERSION"] =…

Norhther
- 545
- 3
- 15
- 35
1
vote
1 answer
closing pydeequ callback server
I'm using pydeequ with Spark 3.0.1 to perform some constraint checks on data.
As for testing with the VerificationSuite, after calling VerificationResult.checkResultsAsDataFrame(spark, result), it seems that the callback server which gets started by…

dataviews
- 2,466
- 7
- 31
- 64
0
votes
0 answers
Error using PyDeequ Profile in Databricks
I am new to Python, Databricks, and pydeequ. I'm trying to use pydeequ in Databricks. I installed the library via Maven using "com.amazon.deequ:deequ:2.0.4-spark-3.3". The analyzers are working, but not the profilerunner.
I am trying to run this…

Azul Selser
- 1
- 1
0
votes
0 answers
Pydeequ satisfy custom expression
Most of the checks in the examples or docs involve just two columns and simple strongly typed functions like (isGreaterThanEqualTo etc).
Is there a way to introduce checks like: columnA + columnB <= columnC - columnD etc. Any way to add a lambda…

trequartista
- 167
- 10
0
votes
0 answers
How do I import Pydeequ on Glue jupyter notebooks?
I have been trying to import Pydeequ to develop tests on AWS Glue's notebook environment. I have downloaded pydeequ.zip file appropriately, and the jar file (deequ-2.0.0-spark-3.1.jar). Both of them are in an s3 bucket. I am using Glue 3.0 which…

Jonathan
- 46
- 3
0
votes
1 answer
Error importing PyDeequ package on Glue 3.0
I am trying to import pydeequ lib in aws enviroment bulding a job with glue. So, I put a zip file of pydeequ in Python library path and jars file in Dependent JARs path . My script is the following:
import sys
from awsglue.transforms import *
from…
0
votes
1 answer
How to set dynamic assert conditions for deequ verification checks in scala
I am using deequ verificationsuite to validate my sql tables but I am unable to implement dynamic assert conditions for checks :
val verificationResult: VerificationResult = { VerificationSuite()
.onData(dataset)
.addCheck(
…

vibhor Gupta
- 103
- 11
0
votes
1 answer
Validation using pydeequ within a Glue job will prevent the job from completing
I am attempting to use the AWS Big Data Blog article to create a job in AWS Glue Studio and use pydeequ to validate the data.
I was successful in running pydeequ in the job, but when using some of the Check methods, the job kept running even after…

trgs
- 1
- 1