3

Jobs work smoothly when using YARN without enabling dynamic allocation feature. I am using Spark 1.4.0.

This is what I am trying to do:

rdd = sc.parallelize(range(1000000))
rdd.first()

This is what I get in logs:

15/09/08 11:36:12 INFO SparkContext: Starting job: runJob at PythonRDD.scala:366
15/09/08 11:36:12 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:366) with 1 output partitions (allowLocal=true)
15/09/08 11:36:12 INFO DAGScheduler: Final stage: ResultStage 0(runJob at PythonRDD.scala:366)
15/09/08 11:36:12 INFO DAGScheduler: Parents of final stage: List()
15/09/08 11:36:12 INFO DAGScheduler: Missing parents: List()
15/09/08 11:36:12 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[1] at RDD at PythonRDD.scala:43), which has no missing parents
15/09/08 11:36:13 INFO MemoryStore: ensureFreeSpace(3560) called with curMem=0, maxMem=278302556
15/09/08 11:36:13 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.5 KB, free 265.4 MB)
15/09/08 11:36:13 INFO MemoryStore: ensureFreeSpace(2241) called with curMem=3560, maxMem=278302556
15/09/08 11:36:13 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.2 KB, free 265.4 MB)
15/09/08 11:36:13 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.1.5.212:50079 (size: 2.2 KB, free: 265.4 MB)
15/09/08 11:36:13 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/09/08 11:36:13 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (PythonRDD[1] at RDD at PythonRDD.scala:43)
15/09/08 11:36:13 INFO YarnScheduler: Adding task set 0.0 with 1 tasks
15/09/08 11:36:14 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1)
15/09/08 11:36:28 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
15/09/08 11:36:43 WARN YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
...

Here a screenshot from cluster UI:

cluster ui

Can anyone provide me a solution? Even a lead would be appreciated.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
gunererd
  • 651
  • 9
  • 19

1 Answers1

4

I solved the problem and it turned out problem wasn't related directly with resource availability. To use dynamic allocation, yarn need to use spark's shuffle service externally instead of MapReduce's shuffle. To have a better understanding about dynamic allocation I recommend you to read this.

gunererd
  • 651
  • 9
  • 19