I am facing an issue where my spark jobs are getting stuck in local while running in IntelliJ idea. My jobs run till a stage like Completing 199 of 200 jobs or completing 1 of 3 tasks and gets stuck there.
I tried to see what is happening using evaluate expression in my IDE and noticed a weird problem. If I am using myDf.rdd.map(r => r).cache() I get
java.io.IOException: Class not found
at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:40)
at org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:81)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2067)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:324)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:323)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.map(RDD.scala:323)
at org.apache.spark.sql.DataFrame.map(DataFrame.scala:1410)
at com.marin.jobcoordinator.spark.extractor.PoExtractorBase$GeneratedEvaluatorClass$18$1.invoke(FileToCompile.scala:66)
At the same time, if I use myDf.rdd.collect I am not seeing this issue. I am able to invoke myDf.show without any problems as well. It is just that when I use map function with anonymous identity function, I am facing this problem. From the exception, what I am getting is that spark is trying to load the class of anonymous function and erroring out?? Seems very strange.
I am using spark version 1.6.0
Thanks,
Sriram