1

I've encountered rather weird behaviour of closures when using Spark 2.2.0. This program

object SparkConst extends App {
  val spark = SparkSession.builder()
    .appName("spark_const")
    .enableHiveSupport()
    .getOrCreate()


  val ctx = spark.sparkContext
  ctx.setLogLevel("WARN")

  val a = 2

  val xs = ctx.parallelize(Seq(1))

  val addValMap = xs.map(_ + a)

  println(s"Result ${addValMap.collect().mkString(",")}")
}

prints "Result 1" so a was equal to zero (default value) when map was evaluated. What am I doing wrong? How am I supposed to pass various constants to RDD transformations?

PS. The application is executed on YARN cluster in client mode.

synapse
  • 5,588
  • 6
  • 35
  • 65
  • 3
    This is common mistake. As the answer below states (from the official documentation) **that applications should define a main() method instead of extending scala.App. Subclasses of scala.App may not work correctly.** – eliasah Jan 10 '19 at 12:58

1 Answers1

3

As explained in the Quick Start Guide

applications should define a main() method instead of extending scala.App. Subclasses of scala.App may not work correctly.

In practice, a lazy nature of App interacts with many features of Spark, including serialization and Accumulators.

So to fix this just rewrite your code to use standard main:

object SparkConst {
  def main(args: Array[String]) = {
    val spark = SparkSession.builder()
    ...
  }
}
10465355
  • 4,481
  • 2
  • 20
  • 44