Spark job getting stuck in local mode

Question

I am facing an issue where my spark jobs are getting stuck in local while running in IntelliJ idea. My jobs run till a stage like Completing 199 of 200 jobs or completing 1 of 3 tasks and gets stuck there.

I tried to see what is happening using evaluate expression in my IDE and noticed a weird problem. If I am using myDf.rdd.map(r => r).cache() I get

java.io.IOException: Class not found
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:40)
at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:81)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2067)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.map(RDD.scala:323)
at org.apache.spark.sql.DataFrame.map(DataFrame.scala:1410)
at com.marin.jobcoordinator.spark.extractor.PoExtractorBase$GeneratedEvaluatorClass$18$1.invoke(FileToCompile.scala:66)

At the same time, if I use myDf.rdd.collect I am not seeing this issue. I am able to invoke myDf.show without any problems as well. It is just that when I use map function with anonymous identity function, I am facing this problem. From the exception, what I am getting is that spark is trying to load the class of anonymous function and erroring out?? Seems very strange.

I am using spark version 1.6.0

Thanks,

Sriram

I gave the simplistic version of the code. Looks like it is more to do with IntelliJ idea classloading while working with spark programs. I am able to reproduce the same issue in my other projects under similar invocation. — sriram, Mar 20 '18 at 00:53
@IonFreeman I haven't resolved that. I just resorted to building the jar and running it in cluster instead of trying to debug this issue from local - as identifying the root cause of this issue just seems to take forever for me. — sriram, Dec 16 '19 at 22:12

score 0 · Answer 1 · answered Mar 16 '18 at 15:34

For your first problem, completing 199 tasks and then getting stuck on the last one, well most of the time this is a skew problem: your data is badly partitioned. Computing the number of elements for each partition and printing it before doing your suspicious operation might give you a hint. Repartitioning and filtering (if possible) your data beforehand can help.

Also using 200 partitions means that you (most likely) use the default number of partitions when data are shuffled. See this post for more info. You can try changing that to fit the number of cores you have on your machine.

For your second problem, I can't help you, sorry.

Spark job getting stuck in local mode

1 Answers1