4

I am having a strange issue with running an application off of the spark master url where the UI is reporting a "STATE" of "WAITING" indefinitely as 0 cores are showing up under the RUNNING APPLICATIONs table no matter what I configure the core count to be.

Ive configured my app with the following settings where spark.max.cores = 2 & spark.default.cores = 2 & memory set to 3GB. The machine is an enterprise class server with over 24 cores.

        SparkConf conf = new SparkConf()
            .setAppName(Properties.getString("SparkAppName"))
            .setMaster(Properties.getString("SparkMasterUrl"))
            .set("spark.executor.memory", Properties.getString("SparkExecMem"))
            .set("spark.cores.max",Properties.getString("SparkCores"))
            .set("spark.driver.memory",Properties.getString("SparkDriverMem"))
            .set("spark.eventLog.enabled", "true")
            .set("spark.deploy.defaultCores",Properties.getString("SparkDefaultCores"));

    //Set Spark context
    JavaSparkContext sc = new JavaSparkContext(conf);
    JavaStreamingContext jssc = new JavaStreamingContext(sc, new Duration(5000));

enter image description here

Spark WebUI states zero cores used and indefinite wait no tasks running. The application is also using NO MEMORY whatsoever during run time or cores and immediately hits a status of waiting when starting.

Spark-defaults.conf 
spark.yarn.max_executor.failures         3
spark.yarn.applicationMaster.waitTries   10
spark.history.kerberos.keytab    none
spark.yarn.preserve.staging.files        False
spark.yarn.submit.file.replication       3
spark.history.kerberos.principal         none
spark.yarn.historyServer.address         {removed}.{removed}.com:18080
spark.yarn.scheduler.heartbeat.interval-ms       5000
spark.yarn.queue         default
spark.yarn.containerLauncherMaxThreads   25
spark.yarn.driver.memoryOverhead         384
spark.history.ui.port    18080
spark.yarn.services      org.apache.spark.deploy.yarn.history.YarnHistoryService
spark.yarn.max.executor.failures         3
spark.driver.extraJavaOptions     -Dhdp.version=2.2.6.0-2800
spark.history.provider   org.apache.spark.deploy.yarn.history.YarnHistoryProvider
spark.yarn.am.extraJavaOptions    -Dhdp.version=2.2.6.0-2800
spark.yarn.executor.memoryOverhead       384

Submit script

spark-submit --class {removed}.{removed}.{removed}.sentiment.MainApp --deploy-mode client /path/to/jar

EDITED: 2/3/2016 After running with --master yarn-cluster I am receiving this in the yarn logs error. I have also included my updated submit configuration

Submit Configuration

spark-submit --class com.removed.removed.sentiment.MainApp 
--master yarn-cluster --supervise 
/data04/dev/removed/spark/twitternpi/npi.sentiment-1.0-SNAPSHOT-shaded.jar 
--jars /usr/hdp/2.2.6.0-2800/spark/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.2.6.0-2800/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.2.6.0-2800/spark/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/2.2.6.0-2800/spark/lib/spark-1.2.1.2.2.6.0-2800-yarn-shuffle.jar,/usr/hdp/2.2.6.0-2800/spark/lib/spark-assembly-1.2.1.2.2.6.0-2800-hadoop2.6.0.2.2.6.0-2800.jar

Error Message

   ClassLoaderResolver for class "" gave error on creation : {1}
org.datanucleus.exceptions.NucleusUserException: ClassLoaderResolver for class "" gave error on creation : {1}
    at org.datanucleus.NucleusContext.getClassLoaderResolver(NucleusContext.java:1087)
    at org.datanucleus.PersistenceConfiguration.validatePropertyValue(PersistenceConfiguration.java:797)
    at org.datanucleus.PersistenceConfiguration.setProperty(PersistenceConfiguration.java:714)
    at org.datanucleus.PersistenceConfiguration.setPersistenceProperties(PersistenceConfiguration.java:693)
    at org.datanucleus.NucleusContext.<init>(NucleusContext.java:273)
    at org.datanucleus.NucleusContext.<init>(NucleusContext.java:247)
    at org.datanucleus.NucleusContext.<init>(NucleusContext.java:225)
user2100493
  • 1,258
  • 4
  • 15
  • 26
  • I am getting similar error - http://stackoverflow.com/questions/38359801/spark-job-submitted-waiting-taskschedulerimpl-initial-job-not-accepted - Standalone Cluster on AWS EC2 - while submitting PySpark application via Spark Rest API. Any clue how to solve it? – Chaitanya Bapat Jul 18 '16 at 08:06

2 Answers2

2

I ran into this problem when the required memory size for the executor, set by spark.executor.memory in spark-defaults.conf, is bigger than that on the AWS node. But since you only set 3.0 GB as your memory, I think there might be other causes in your case.

Xiangyu
  • 824
  • 9
  • 34
0

If you're running in yarn, you need to tell your application to use yarn. Add master yarn-cluster to your spark-submit command

spark-submit --class your_class --master yarn-cluster /path/to/jar

EDIT:

The spark.cores.max is for Mesos or Standalone. Try setting this:

.set("spark.executor.cores","2")

And at runtime add this to the submit

--num-executors=2

I am curious though, as it should default to 1 core per executor. Are the worker nodes registered with the YARN for Spark? Have you successfully used Spark at all on this cluster in yarn-client or yarn-cluster mode?

Joe Widen
  • 2,378
  • 1
  • 15
  • 21