-1

I am executing spark-Scala job using Spark submit command. I have written my code in spark sql where i am joining 2 tables and loading data again in 3rd hive. code is working fine,But sometimes i am getting some issue like OutofmemoryIssue: Java heap size issue,Timeout error. So i want to control my job manually by passing number of executors, cores and memory.When i used 16 executor,1 core and 20 GB executor memory my spark application is getting stuck. can someone please suggest me how should i control manually my spark application by providing correct parameter.and is there any other hive or spark specific parameter are there which i can use for fast execution.

    below is configuration of my cluster.



    Number of Nodes: 5
    Number of Cores per Node: 6
    RAM per Node: 125 gb

     Spark Submit Command.
     spark-submit --class org.apache.spark.examples.sparksc \
    --master yarn-client \
   --num-executors 16 \
    --executor-memory 20g \
    --executor-cores 1 \
    examples/jars/spark-examples.jar
Rahul Patidar
  • 189
  • 1
  • 1
  • 14

1 Answers1

1

It depends on volume of your data. you can make dynamic parameters. This link has very nice explanation How to tune spark executor number, cores and executor memory?

you can enable spark.shuffle.service.enabled, use spark.sql.shuffle.partitions=400, hive.exec.compress.intermediate=true, hive.exec.reducers.bytes.per.reducer=536870912, hive.exec.compress.output=true, hive.output.codec=snappy, mapred.output.compression.type=BLOCK

if your data >700MB you can enable spark.speculation properties