0

I have 2 machines with 32gb ram and 8core each machine. So how can i configure the yarn with spark and which properties i have to use to tune the resources according to our dataset. I have 8gb dataset, So can anyone suggest the configuration of yarn with spark in parallel jobs running?

Here is the yarn configuration: I'm using hadoop 2.7.3,spark 2.2.0 and ubuntu 16

`yarn scheduler minimum-allocation-mb--2048 
yarn scheduler maximum-allocation-mb--5120
yarn nodemanager resource.memory-mb--30720 
yarn scheduler minimum-allocation-vcores--1 
yarn scheduler maximum-allocation-vcores--6 
yarn nodemanager resource.cpu-vcores--6`

Here is the spark configuration:

spark master    master:7077 
spark yarn am memory 4g 
spark yarn am cores 4 
spark yarn am memoryOverhead    412m 
spark executor instances    3 
spark executor cores    4 
spark executor memory   4g 
spark yarn executor memoryOverhead  412m

but my question is with 32gb ram and 8core each machine. how many applications i can run whether this conf is correct? bcoz only two applications running parallely.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Which version of Hadoop? Which version of Spark? Linux OS Vendor and version? – John Hanley Sep 17 '18 at 05:32
  • Add the contents of the following files to your question (just the parameters section). yarn-site.xml, mapred-site.xml hdfs-site.xml, core-site.xml, spark-defaults.conf. – John Hanley Sep 17 '18 at 05:42
  • I'm using hadoop 2.7.3,spark 2.2.0 and ubuntu 16 yarn.scheduler.minimum-allocation-mb--2048 yarn.scheduler.maximum-allocation-mb--5120 yarn.nodemanager.resource.memory-mb--30720 yarn.scheduler.minimum-allocation-vcores--1 yarn.scheduler.maximum-allocation-vcores--6 yarn.nodemanager.resource.cpu-vcores--6 – vikram reddy Sep 17 '18 at 06:07
  • spark.master spark://master:7077 spark.yarn.am.memory 4g spark.yarn.am.cores 4 spark.yarn.am.memoryOverhead 412m spark.executor.instances 3 spark.executor.cores 4 spark.executor.memory 4g spark.yarn.executor.memoryOverhead 412m Here is the spark conf.but my question is with 32gb ram and 8core each machine how many applications i can run whether this conf is correct? bcoz only two applications running parallely. – vikram reddy Sep 17 '18 at 06:14
  • 1
    Edit your question and include the file contents so that everything is readable. – John Hanley Sep 17 '18 at 06:16
  • I have a recent blog-post on this: https://sujithjay.com/2018/07/24/Understanding-Apache-Spark-on-YARN/ .. I feel it covers a majority of your questions regarding configurations of YARN and Spark. – suj1th Sep 18 '18 at 12:16
  • Hi suj1t txs for rply. Here i have doubt that, In cluster mode we will specify the spark.yarn.am.memory + spark.yarn.am.memoryOverhead memory not in client mode. I have read in apache site, but In your blog spark.yarn.am.memory + spark.yarn.am.memoryOverhead is written in client mode it should be in cluster mode pls verfiy whether i'm correct? – vikram reddy Sep 20 '18 at 10:11
  • In cluster mode if we specify the 4gb ram to the yarn scheduler maximum-allocation-mb then what should be the spark Driver and Executor memory? – vikram reddy Sep 20 '18 at 10:13

0 Answers0