I am currently working with EMR 6.4.0 and facing an issue while running a pyspark application. The code was working fine but suddenly it started failing. I am currently stuck with two errors to which i have no clue how to resolve it.
The objective of the code is to get data read data from snowflake, save temporary data on S3 and write data back on different snowflake table at the end.
1) No Class found exception: I am getting the below error on my EMR spark steps. I tried looking into many post but i am still not clear how to fix this:
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
**Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/javax/ws/rs/core/NoContentException**
at org.apache.hadoop.yarn.util.timeline.TimelineUtils.<clinit>(TimelineUtils.java:60)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:200)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:191)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1327)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1764)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
**Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.javax.ws.rs.core.NoContentException**
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 13 more
Command exiting with ret '1'
- I am submit my pyspark code using the below command in the EMR steps on a M4.large instance for test (In PROD, i have a bigger instance type - M5.8XLarge).
spark-submit --deploy-mode cluster --master yarn --driver-memory 4g --executor-memory 1g --executor-cores 1 --num-executors 1 --conf spark.rpc.message.maxSize=100 --jars /home/hadoop/configure_cluster/snowflake-jdbc-3.13.8.jar,/home/hadoop/configure_cluster/spark-snowflake_2.12-2.9.1-spark_3.1.jar --py-files /home/hadoop/spark_utils.zip /home/hadoop/weibull_2.py dev dafehv-dse-weibull-processing-dev
As shown in command above, i am trying to limit memory limits by specifying it in spark submit command. But i can see in logs that i get below error -
diagnostics: Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request! Cannot allocate containers as requested resource is greater than maximum allowed allocation. Requested resource type=[memory-mb], Requested resource=<memory:35789, max memory:2147483647, vCores:2, max vCores:2147483647>, maximum allowed allocation=<memory:6144, vCores:4>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:6144, vCores:128>
Why do spark try to dynamically allocate the containers with resources not mentioned in spark submit? I am lost here, I have been trying to find how to fix the above two issues from last week, but to no avail. I havent worked a lot with Spark, but can anyone please guide me how can i proceed to fix the issues?