I have been trying to import Pydeequ to develop tests on AWS Glue's notebook environment. I have downloaded pydeequ.zip file appropriately, and the jar file (deequ-2.0.0-spark-3.1.jar). Both of them are in an s3 bucket. I am using Glue 3.0 which uses Spark 3.11.
I have tried many different versions of this with the following specs:
%magics
- %extra_jars s3://path/dependencies/deequ-2.0.0-spark-3.1.jar
- %additional_python_modules pydeequ
- %extra_py_files s3://path/dependencies/pydeequ.zip
Like said, i've tried all possible combinations of these. The import seems to work fine, and when running "pydeequ" in the cell it seems to point ' /home/spark/.local/lib/python3.7/site-packages/pydeequ/__init__.py
When trying to run the most basic pydeequ operation such as:
analysisResult = AnalysisRunner(spark) \
.onData(df) \
.addAnalyzer(Size()) \
.run()
analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
analysisResult_df.show()
This results in the following error:
Py4JJavaError: An error occurred while calling None.com.amazon.deequ.analyzers.Size.
: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
at com.amazon.deequ.analyzers.Size.<init>(Size.scala:37)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750)
Has anyone else been successful in running pydeequ on Glue notebooks?