0

I have been trying to import Pydeequ to develop tests on AWS Glue's notebook environment. I have downloaded pydeequ.zip file appropriately, and the jar file (deequ-2.0.0-spark-3.1.jar). Both of them are in an s3 bucket. I am using Glue 3.0 which uses Spark 3.11.

I have tried many different versions of this with the following specs:

%magics

  • %extra_jars s3://path/dependencies/deequ-2.0.0-spark-3.1.jar
  • %additional_python_modules pydeequ
  • %extra_py_files s3://path/dependencies/pydeequ.zip

Like said, i've tried all possible combinations of these. The import seems to work fine, and when running "pydeequ" in the cell it seems to point ' /home/spark/.local/lib/python3.7/site-packages/pydeequ/__init__.py

When trying to run the most basic pydeequ operation such as:

analysisResult = AnalysisRunner(spark) \
                    .onData(df) \
                    .addAnalyzer(Size()) \
                    .run()

analysisResult_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
analysisResult_df.show()

This results in the following error:

Py4JJavaError: An error occurred while calling None.com.amazon.deequ.analyzers.Size.
: java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)V
    at com.amazon.deequ.analyzers.Size.<init>(Size.scala:37)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:750)

Has anyone else been successful in running pydeequ on Glue notebooks?

Jonathan
  • 46
  • 3

0 Answers0