I'm trying to run the sample code for pattern check "hasPattern()" with PyDeequ and it fails with Exception
The code:
import pydeequ
from pyspark.sql import SparkSession, Row
spark = (SparkSession
.builder
.config("spark.jars.packages", pydeequ.deequ_maven_coord)
.config("spark.jars.excludes", pydeequ.f2j_maven_coord)
.getOrCreate())
df = spark.sparkContext.parallelize([
Row(a="foo", creditCard="5130566665286573", email="foo@example.com", ssn="123-45-6789",
URL="http://userid@example.com:8080"),
Row(a="bar", creditCard="4532677117740914", email="bar@example.com", ssn="123456789",
URL="http://example.com/(something)?after=parens"),
Row(a="baz", creditCard="3401453245217421", email="foobar@baz.com", ssn="000-00-0000",
URL="http://userid@example.com:8080")]).toDF()
from pydeequ.checks import *
from pydeequ.verification import *
check = Check(spark, CheckLevel.Error, "Integrity checks")
checkResult = VerificationSuite(spark) \
.onData(df) \
.addCheck(
check.hasPattern(column='email',
pattern=r".*@baz.com",
assertion=lambda x: x == 1 / 3)) \
.run()
checkResult_df = VerificationResult.checkResultsAsDataFrame(spark, checkResult)
checkResult_df.show()
After run I recieve:
AttributeError: 'NoneType' object has no attribute '_Check'
on line
check.hasPattern(column='email',
pattern=r".*@baz.com",
assertion=lambda x: x == 1 / 3)
PyDeequ version: 1.0.1 Python version: Python 3.7.9