0

How can Apache Spark or Hadoop Mapreduce request a fixed number of containers?

In Spark yarn-client mode, it can be requested by setting the configuration spark.executor.instances, which is directly related to the number of YARN containers it gets. How does Spark transform this into a Yarn parameter that is understood by Yarn?

I know by default, it can depend upon number of splits and configuration values yarn.scheduler.minimum-allocation-mb, yarn.scheduler.minimum-allocation-vcores. But Spark has ability to exactly request fixed number of containers. How can any AM do that?

Manjunath Ballur
  • 6,287
  • 3
  • 37
  • 48
nir
  • 3,743
  • 4
  • 39
  • 63

2 Answers2

0

In Hadoop Map reduce, Number of containers for map task is decided based on number of input splits. It is based on the size of source file. for every Input split, one map container will be requested.

By default number of Reducer per job is one. It can be customized by passing arguments to mapreduce.reduce.tasks. Pig & Hive has different logic to decide number of reducers. ( this also can be customized).

One container (Reduce container, usually bigger than map container) will be requested per reducers.

Total number of mappers & reducers will be clearly defined in job config file during job submission.

Vijayanand
  • 470
  • 4
  • 10
  • This is does not answer the OP's question. `How AM request a fixed number of containers?` – YoungHobbit Oct 13 '15 at 16:03
  • AM gets total number of containers needed from computed input splits (for Mappers) which is already calculated and stored in staging directory of the user (who submits the job) in HDFS. – Vijayanand Oct 13 '15 at 16:06
  • `But Spark has ability to exactly request fixed number of containers. How can any AM do that?` But your answer does not have any context about spark at all. – YoungHobbit Oct 13 '15 at 16:09
  • Question is for "Apache Spark or Hadoop Mapreduce" , My answer was only for Map Reduce Framework. – Vijayanand Oct 13 '15 at 16:11
0

I think it's by using AM api that yarn provides. AM provider can use rsrcRequest.setNumContainers(numContainers); http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_a_simple_Client Here I had similar discussion on other questionl. Yarn container understanding and tuning

Community
  • 1
  • 1
nir
  • 3,743
  • 4
  • 39
  • 63