3

I have a Spark 1.4.0 project where I'm trying to parse several JSON records containing a timestamp field and store it in a ZonedDateTime object, using Jackson and the JSR-310 module. If I try to run the driver program from the IDE (namely, IntelliJ IDEA 14.0) it runs correctly, but if I use sbt assembly and spark-submit, then I get the following exception:

15/07/16 14:13:03 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
java.lang.AbstractMethodError: com.mycompany.input.EventParser$$anonfun$1$$anon$1.com$fasterxml$jackson$module$scala$experimental$ScalaObjectMapper$_setter_$com$fasterxml$jackson$module$scala$experimental$ScalaObjectMapper$$typeCache_$eq(Lorg/spark-project/guava/cache/LoadingCache;)V
    at com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper$class.$init$(ScalaObjectMapper.scala:50)
    at com.mycompany.input.EventParser$$anonfun$1$$anon$1.<init>(EventParser.scala:27)
    at com.mycompany.input.EventParser$$anonfun$1.apply(EventParser.scala:27)
    at com.mycompany.input.EventParser$$anonfun$1.apply(EventParser.scala:24)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:686)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
    at org.apache.spark.scheduler.Task.run(Task.scala:70)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

I have tried several versions of assembly, jackson and spark, but no luck. I guess this is somehow related to a dependency conflict between spark and my project (somehow, with the Guava library). Any ideas?

Thanks!

EDIT: example project to reproduce the issue here.

ale64bit
  • 6,232
  • 3
  • 24
  • 44
  • Check the versions of any jars in spark's own lib directory (e.g. jackson). Then ensure that you build against those exact same versions. – lmm Jul 16 '15 at 15:44
  • @Imm Seems like that Spark 1.4.0 uses Jackson 2.4.4, which is the same version I'm using in my project. Still, no luck. The same problem remains. – ale64bit Jul 16 '15 at 18:20
  • You are using the same versions of scala, right? – lmm Jul 17 '15 at 08:56
  • @Imm Yes, I tried with the same version of Scala (2.10.4). Also I compiled Spark myself with Scala 2.11, I also made an equivalent Maven project to try to shade the offending libraries. But still, no luck :( – ale64bit Jul 17 '15 at 09:02
  • I can post a minimal example project for reproducing the issue, if it helps. – ale64bit Jul 17 '15 at 09:03
  • All I can suggest is: find out which library com.mycompany.input.EventParser is in and which library com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper are in. Then look at mvn dependency:tree and check very carefully that the versions of neither have been changed, that everything that uses scala uses the same version of scala (and likewise java), and that the versions of libraries in the maven dependencies match those installed in the spark library directory that the cluster where you're running this uses. Your example will probably work fine on other people's clusters. – lmm Jul 17 '15 at 09:12
  • Not sure if this helps. but Jackson Scala module does rely on Guava so it has its own idea of which Guava to bring in. I think you should explicitly specify Guava version to use, although to figure out proper version may take time. You could start by using highest of differing versions that components require. – StaxMan Jul 17 '15 at 19:26
  • @StaxMan I tried to bring Guava 15.0 and 14.0.1 as well, but none of them makes any difference. The error persists. Any other idea? Thanks for the help! – ale64bit Jul 17 '15 at 19:52
  • 1
    Unfortunately all that is clear is that some part of the system is having trouble with Guava version different than what it was compiled to use. My other suggestion would be to try to upgrade to Jackson 2.5 (see the answer); that would eliminate one of Guava dependencies, as 2.5 of Scala module does NOT depend on Guava any more. – StaxMan Jul 17 '15 at 21:38
  • @kaktusito - Are you able to solve this? I am having the save issue. I am using `libraryDependencies += "com.fasterxml.jackson.module" %% "jackson-module-scala" % "2.3.5"` – sag Jul 31 '15 at 11:39
  • @SamuelAlexander no luck yet. If you happen to find out the answer, please add it to this post. – ale64bit Aug 04 '15 at 17:48
  • @kaktusito - There was an issue with the JSON I've been using. After correcting that it worked. – sag Aug 05 '15 at 04:39
  • @SamuelAlexander Have you solve this issuse? – Allen Xudong Cheng Aug 29 '15 at 10:05
  • @AllenXudongCheng - I just solved by using 2.3.5 version of jackson-module – sag Aug 29 '15 at 10:38
  • Any update on this issue? I'm also having the same issue. Tried to work with version 2.3.5 and didn't solve it for me. Also tried to work with newest version 2.6.1 and didn't work. Someone has any updates? Thanks – Avihoo Mamka Sep 01 '15 at 15:04

2 Answers2

5

I had a similar issue and solved it by changing 2 things:

1) I used ObjectMapper instead of ScalaObjectMapper as suggested in a comment to this SO question: Error in running job on Spark 1.4.0 with Jackson module with ScalaObjectMapper

2) I needed to define the mapper inside the map operation.

val alertsData = sc.textFile(rawlines).map(alertStr => {
      val mapper = new ObjectMapper()
      mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false)
      mapper.registerModule(DefaultScalaModule)
      broadcastVar.value.readValue(alertStr, classOf[Alert])
    })

If the mapper is defined outside I got a NullPointerException. Also tried to broadcast it and it didn't work.

Also, there is no need to explicitly add jackson as a dependancy as spark provides it.

Hope this helps.

Aliza

Community
  • 1
  • 1
Aliza
  • 734
  • 1
  • 10
  • 25
  • Similar to the `Learning Spark` example, but don't use the ScalaObjectMapper https://github.com/databricks/learning-spark/blob/master/src/main/scala/com/oreilly/learningsparkexamples/scala/BasicParseJsonWithJackson.scala – Joseph Lust Sep 23 '15 at 21:25
  • @Aliza But this wouldn't create an `ObjectMapper` per each element of the RDD? That sounds like an overkill. Correct me if I'm wrong. – ale64bit Sep 24 '15 at 05:50
  • @kaktusito You are right. See the solution I came up with here: http://stackoverflow.com/questions/32495891/spark-broadcasting-jackson-objectmapper/32500040#32500040 – Aliza Oct 06 '15 at 04:45
  • @Aliza Thanks for the solution. I figured it out before seeing this. I was translating a Scalding program to Spark and encountered this problem. It seems Spark doesn't serialize and distribute the whole class as Scalding does for Job class. It still feels weird to manually broadcast a JSON mapper. – piggybox Oct 28 '15 at 07:59
1

One thing that could help would be to upgrade to Jackson 2.5. While Jackson Scala module did depend on Guava up to version 2.4, this dependency was removed from 2.5 (there is test dependency for tests, but nothing for runtime). This would at least eliminate transitive dependency conflict.

StaxMan
  • 113,358
  • 34
  • 211
  • 239
  • I also tried this, as suggested by you, but also didn't help. I uploaded a very simple project to reproduce the issue in github. See the edit in my question. Thanks for the help. – ale64bit Jul 20 '15 at 18:13