127

I have an Spark app which runs with no problem in local mode,but have some problems when submitting to the Spark cluster.

The error msg are as follows:

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, cluster-node-02): java.lang.ExceptionInInitializerError
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: A master URL must be set in your configuration
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
    at GroupEvolutionES$.<init>(GroupEvolutionES.scala:37)
    at GroupEvolutionES$.<clinit>(GroupEvolutionES.scala)
    ... 14 more

16/06/24 15:42:06 WARN scheduler.TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, cluster-node-02): java.lang.NoClassDefFoundError: Could not initialize class GroupEvolutionES$
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at GroupEvolutionES$$anonfun$6.apply(GroupEvolutionES.scala:579)
    at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:390)
    at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1157)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

In the above code, GroupEvolutionES is the main class. The error msg says "A master URL must be set in your configuration", but I have provided the "--master" parameter to spark-submit.

Anyone who knows how to fix this problem?

Spark version: 1.6.1

Shuai Zhang
  • 2,011
  • 3
  • 22
  • 23
  • 1
    Could you please paste the command here that you are using to submit the script. – Shivansh Jun 24 '16 at 08:18
  • Have you provided the spark master URL ? – Kshitij Kulshrestha Jun 24 '16 at 08:35
  • @ShivanshSrivastava spark-submit --class GroupEvolutionES --master spark://cluster-node-nn1:7077 --jars $mypath myapp.jar – Shuai Zhang Jun 24 '16 at 08:59
  • @KSHITIJKULSHRESTHA Yes. – Shuai Zhang Jun 24 '16 at 08:59
  • I ran into this in my `Spark` project's **unit-tests** ([`DataFrameSuiteBase`](https://github.com/holdenk/spark-testing-base/wiki/DataFrameSuiteBase)). From **@Dazzler**'s answer, I understood that I must move `DataFrame`-creation inside `test(..) { .. }` suites. But also just **declaring `DataFrame`s to be `lazy`** fixes it (love `Scala`!). This has been pointed out be **@gyuseong** in [his answer](https://stackoverflow.com/a/51376513/3679900) below – y2k-shubham Aug 02 '18 at 13:52

16 Answers16

210

The TLDR:

.config("spark.master", "local")

a list of the options for spark.master in spark 2.2.1

I ended up on this page after trying to run a simple Spark SQL java program in local mode. To do this, I found that I could set spark.master using:

SparkSession spark = SparkSession
.builder()
.appName("Java Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate();

An update to my answer:

To be clear, this is not what you should do in a production environment. In a production environment, spark.master should be specified in one of a couple other places: either in $SPARK_HOME/conf/spark-defaults.conf (this is where cloudera manager will put it), or on the command line when you submit the app. (ex spark-submit --master yarn).

If you specify spark.master to be 'local' in this way, spark will try to run in a single jvm, as indicated by the comments below. If you then try to specify --deploy-mode cluster, you will get an error 'Cluster deploy mode is not compatible with master "local"'. This is because setting spark.master=local means that you are NOT running in cluster mode.

Instead, for a production app, within your main function (or in functions called by your main function), you should simply use:

SparkSession
.builder()
.appName("Java Spark SQL basic example")
.getOrCreate();

This will use the configurations specified on the command line/in config files.

Also, to be clear on this too: --master and "spark.master" are the exact same parameter, just specified in different ways. Setting spark.master in code, like in my answer above, will override attempts to set --master, and will override values in spark-defaults.conf, so don't do it in production. Its great for tests though.

also, see this answer. which links to a list of the options for spark.master and what each one actually does.

a list of the options for spark.master in spark 2.2.1

Jack Davidson
  • 4,613
  • 2
  • 27
  • 31
  • 6
    yes , adding ".config("spark.master", "local")" worked for me also . – Ashutosh Shukla Jan 21 '17 at 07:28
  • Thanks this worked for me - but could someone explain to a newbie (me) what the .config("spark.master", "local") is doing? Will my code still be fine to compile into a jar and run in production? – Reddspark Sep 03 '17 at 20:06
  • 4
    @user1761806 while many of the answers report this as a fix, it fundamentally changes the way spark processes, only using a single JVM. Local is used for local testing and is not the correct solution to fix this problem if you intend to deploy to a cluster. I had similar issues and the accepted answer was the correct solution to my problem. – Nathaniel Wendt Sep 21 '17 at 18:41
65

Worked for me after replacing

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");

with

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Found this solution on some other thread on stackoverflow.

gre_gor
  • 6,669
  • 9
  • 47
  • 52
Sachin
  • 805
  • 1
  • 7
  • 7
  • 3
    Does this solve the OP's question? This creates a local cluster in this JVM, not attach to a standalone elsewhere. – Azeroth2b Mar 08 '17 at 23:53
  • This does solve the issue. I don't know (yet) about the implications of `setMaster("local[2]")` (would be nice to have an explanation), but this answer can be considered the solution for the issue. – Rick Mar 23 '17 at 15:34
  • I just edited the answer to include this information :) – Rick Mar 23 '17 at 16:05
  • What should be the `master` value for databricks AWS cluster? – insanely_sin Nov 04 '22 at 14:21
45

Where is the sparkContext object defined, is it inside the main function?

I too faced the same problem, the mistake which i did was i initiated the sparkContext outside the main function and inside the class.

When I initiated it inside the main function, it worked fine.

Dazzler
  • 807
  • 9
  • 11
  • 17
    Spark really needs to improve: it just shows very confusing and uninformative error messages when something wrong happends – Shuai Zhang Jun 24 '16 at 15:04
  • 3
    This is a workaround and not a solution, What if I want to created a Singletion Context and create a separate layer of Context apart from main function for multiple applications? – Murtaza Kanchwala Nov 15 '16 at 14:04
  • 1
    "Note that applications should define a `main()` method instead of extending `scala.App`. Subclasses of `scala.App` may not work correctly." [Spark 2.1.0 Manual](http://spark.apache.org/docs/latest/quick-start.html#self-contained-applications) – ruhong Mar 16 '17 at 07:49
  • 1
    Pay attention to where you try to `getOrCreate()` a context should be created at driver level and passed on to executor level as needed. – reim Feb 20 '18 at 10:25
30

The default value of "spark.master" is spark://HOST:PORT, and the following code tries to get a session from the standalone cluster that is running at HOST:PORT, and expects the HOST:PORT value to be in the spark config file.

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .getOrCreate();

"org.apache.spark.SparkException: A master URL must be set in your configuration" states that HOST:PORT is not set in the spark configuration file.

To not bother about value of "HOST:PORT", set spark.master as local

SparkSession spark = SparkSession
    .builder()
    .appName("SomeAppName")
    .config("spark.master", "local")
    .getOrCreate();

Here is the link for list of formats in which master URL can be passed to spark.master

Reference : Spark Tutorial - Setup Spark Ecosystem

arjun
  • 1,645
  • 1
  • 19
  • 19
11

just add .setMaster("local") to your code as shown below:

val conf = new SparkConf().setAppName("Second").setMaster("local") 

It worked for me ! Happy coding !

pappbence96
  • 1,164
  • 2
  • 12
  • 20
kumar sanu
  • 131
  • 1
  • 5
7

If you are running a standalone application then you have to use SparkContext instead of SparkSession

val conf = new SparkConf().setAppName("Samples").setMaster("local")
val sc = new SparkContext(conf)
val textData = sc.textFile("sample.txt").cache()
Sasikumar Murugesan
  • 4,412
  • 10
  • 51
  • 74
  • 7
    `.setMaster("local")` is the key to solve the issue for me – tom10271 Jul 20 '18 at 09:03
  • What if I have it set but still have this error? @tom10271 – Anna Leonenko Jun 09 '20 at 16:12
  • @AnnaLeonenko I am sorry but I have stopped developing Spark application for a year already, I cannot recall my memory. But I guess your master node is not local which is managed by spark but yarn? – tom10271 Jun 11 '20 at 01:43
  • 1
    @AnnaLeonenko I have checked my settings. When I was running it locally for development and I only use Spark to manage master node, then I will set it to `local` or `local[*]`. When I deploy it to AWS EMR, it uses Yarn for coordination, then I set master as `yarn` – tom10271 Jun 11 '20 at 01:53
4

Replacing :

SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME");
WITH
SparkConf sparkConf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[2]").set("spark.executor.memory","1g");

Did the magic.

Raviteja
  • 3,399
  • 23
  • 42
  • 69
Nazima
  • 73
  • 3
4

I had the same problem, Here is my code before modification :

package com.asagaama

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.rdd.RDD

/**
  * Created by asagaama on 16/02/2017.
  */
object Word {

  def countWords(sc: SparkContext) = {
    // Load our input data
    val input = sc.textFile("/Users/Documents/spark/testscase/test/test.txt")
    // Split it up into words
    val words = input.flatMap(line => line.split(" "))
    // Transform into pairs and count
    val counts = words.map(word => (word, 1)).reduceByKey { case (x, y) => x + y }
    // Save the word count back out to a text file, causing evaluation.
    counts.saveAsTextFile("/Users/Documents/spark/testscase/test/result.txt")
  }

  def main(args: Array[String]) = {
    val conf = new SparkConf().setAppName("wordCount")
    val sc = new SparkContext(conf)
    countWords(sc)
  }

}

And after replacing :

val conf = new SparkConf().setAppName("wordCount")

With :

val conf = new SparkConf().setAppName("wordCount").setMaster("local[*]")

It worked fine !

3

How does spark context in your application pick the value for spark master?

  • You either provide it explcitly withing SparkConf while creating SC.
  • Or it picks from the System.getProperties (where SparkSubmit earlier put it after reading your --master argument).

Now, SparkSubmit runs on the driver -- which in your case is the machine from where you're executing the spark-submit script. And this is probably working as expected for you too.

However, from the information you've posted it looks like you are creating a spark context in the code that is sent to the executor -- and given that there is no spark.master system property available there, it fails. (And you shouldn't really be doing so, if this is the case.)

Can you please post the GroupEvolutionES code (specifically where you're creating SparkContext(s)).

Sachin Tyagi
  • 2,814
  • 14
  • 26
  • 1
    Yes. I should have created the SparkContext in the `main` functions of GroupEvolutionES (which I didn't). – Shuai Zhang Jun 24 '16 at 15:09
  • 1
    This is a workaround and not a solution, What if I want to created a Singletion Context and create a separate layer of Context apart from main function for multiple applications? Any comments on how I can achieve it? – Murtaza Kanchwala Nov 15 '16 at 14:04
2
var appName:String ="test"
val conf = new SparkConf().setAppName(appName).setMaster("local[*]").set("spark.executor.memory","1g");
val sc =  SparkContext.getOrCreate(conf)
sc.setLogLevel("WARN")
rio
  • 685
  • 9
  • 16
2

try this

make trait

import org.apache.spark.sql.SparkSession
trait SparkSessionWrapper {
   lazy val spark:SparkSession = {
      SparkSession
        .builder()
        .getOrCreate()
    }
}

extends it

object Preprocess extends SparkSessionWrapper {
Sasikumar Murugesan
  • 4,412
  • 10
  • 51
  • 74
gyuseong
  • 41
  • 2
2

I used this SparkContext constructor instead, and errors were gone:

val sc = new SparkContext("local[*]", "MyApp")
remondo
  • 318
  • 2
  • 7
1

We are missing the setMaster("local[*]") to set. Once we added then problem get resolved.

Problem:

val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()

solution:

val spark = SparkSession
      .builder()
      .appName("Spark Hive Example")
      .config("spark.sql.warehouse.dir", warehouseLocation)
      .enableHiveSupport()
      .master("local[*]")
      .getOrCreate()
KARTHIKEYAN.A
  • 18,210
  • 6
  • 124
  • 133
1

Tried this option in learning Spark processing with setting up Spark context in local machine. Requisite 1)Keep Spark sessionr running in local 2)Add Spark maven dependency 3)Keep the input file at root\input folder 4)output will be placed at \output folder. Getting max share value for year. down load any CSV from yahoo finance https://in.finance.yahoo.com/quote/CAPPL.BO/history/ Maven dependency and Scala code below -

<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.4.3</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>   

object MaxEquityPriceForYear {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("ShareMaxPrice").setMaster("local[2]").set("spark.executor.memory", "1g");
    val sc = new SparkContext(sparkConf);
    val input = "./input/CAPPL.BO.csv"
    val output = "./output"
    sc.textFile(input)
      .map(_.split(","))
      .map(rec => ((rec(0).split("-"))(0).toInt, rec(1).toFloat))
      .reduceByKey((a, b) => Math.max(a, b))
      .saveAsTextFile(output)
  }
0

If you are using following code

 val sc = new SparkContext(master, "WordCount", System.getenv("SPARK_HOME"))

Then replace with following lines

  val jobName = "WordCount";
  val conf = new SparkConf().setAppName(jobName);
  val sc = new SparkContext(conf)

In Spark 2.0 you can use following code

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .master("local[*]")// need to add
  .getOrCreate()

You need to add .master("local[*]") if runing local here * means all node , you can say insted of 8 1,2 etc

You need to set Master URL if on cluster

vaquar khan
  • 10,864
  • 5
  • 72
  • 96
0

If you don't provide Spark configuration in JavaSparkContext then you get this error. That is: JavaSparkContext sc = new JavaSparkContext();

Solution: Provide JavaSparkContext sc = new JavaSparkContext(conf);