0

I have spark scala 1.6.1_2.10 project with 2 modules not dependent at compile time. The first modules is initiating a spark driver app. In first module, in one of the rdd.map{} operation I am trying to load a class using reflection class.forName("second.module.function.MapOperation")

my spark-submit has both the jars for both module one as primary and other in --jars option.

This code run fine in local on my intellij. This fails due to ClassNotFound second.module.function.MapOperation on cluster Also fails in functional test cases with ClassNotFound, if I test the same class.

I there an issue with classloaders and using Class.forName in a spark job/operation?

RockSolid
  • 488
  • 1
  • 4
  • 12
  • possible reason is jar is missing in cluster . ... have a look at my [answer](https://stackoverflow.com/a/43720970/647053) – Ram Ghadiyaram Jun 10 '17 at 11:15
  • Nope rest of classes which are directly called (without reflection) are invoked without any issue.. – RockSolid Jun 12 '17 at 12:47
  • can you check like this inside map method `val cl = ClassLoader.getSystemClassLoader cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println) ` it will tell whether your jar is present or not – Ram Ghadiyaram Jun 12 '17 at 13:18
  • how are you launching the job/and which mode yarn cluster or yarn client... can you print your spark-submit here – Ram Ghadiyaram Jun 12 '17 at 13:27

1 Answers1

0

You need to put the jars in hdfs and provide that path to spark submit.

This way all of the spark processes will have access to the class.

Chitral Verma
  • 2,695
  • 1
  • 17
  • 29