0

I m trying to measure the performance in Spark depending on the number of executors and cores . The idea is to play with:

   spark.conf.set("spark.executor.instances", "x")
   spark.conf.set('spark.cores.max', 'x')

to test the impvoe of performance of Spark when I change the number of executors and cores. Data is 1.66GB Twitter files .json I m working wiht a computer hp:

Prozessor: Intel(R) Core(TM) i7-8650U CPU @ 1.90Ghz 2.11GHz // 16 GB RAM

 import time
 st = time.time()
 print("start time: ", st)

 #### Code  ####

elapsed_time = time.time() - st
print("...Elapsed time SPARK: %.2fs" % elapsed_time)

I discover that the performance barely change if I use in executors 1,3,5

for example

   import time
   st = time.time()
   print("start time: ", st)
   spark = SparkSession.builder.appName('Basics').getOrCreate()
   spark.conf.set("spark.executor.instances", "1")
   spark.conf.set('spark.cores.max', '1')
   df = spark.read.json(mount + '/*/*.json.bz2' )
   elapsed_time = time.time() - st
   print("...Elapsed time SPARK: %.2fs" % elapsed_time)

1: 1 executor, 1 core start time: 1549530285.584573 ...Elapsed time SPARK: 315.52s

2: 3 executor,3 core start time: 1549528358.4399529 ...Elapsed time SPARK: 308.30s

3: 5 executor,5 core start time: 1549528690.1516254 ...Elapsed time SPARK: 289.28s

Are that improve normal? I was expecting something much more significant.

Enrique Benito Casado
  • 1,914
  • 1
  • 20
  • 40
  • 1
    Possible duplicate of [Spark: Inconsistent performance number in scaling number of cores](https://stackoverflow.com/questions/41090127/spark-inconsistent-performance-number-in-scaling-number-of-cores) – 10465355 Feb 07 '19 at 12:06
  • 1
    Additionally, is that `local` mode or standalone? If the former one, [these settings have no use at all](https://stackoverflow.com/q/39986507/10465355). – 10465355 Feb 07 '19 at 12:07

1 Answers1

2

Spark performance depends on different factors like workload type, partitioning scheme, data skew, memory consumption etc. You can check the Spark documentation for more information.

Secondly you cannot change executor count on the fly. It is stated in Spark documentation as;

Spark properties mainly can be divided into two kinds: one is related to deploy, like “spark.driver.memory”, “spark.executor.instances”, this kind of properties may not be affected when setting programmatically through SparkConf in runtime, or the behavior is depending on which cluster manager and deploy mode you choose, so it would be suggested to set through configuration file or spark-submit command line options; another is mainly related to Spark runtime control, like “spark.task.maxFailures”, this kind of properties can be set in either way.

sgungormus
  • 119
  • 5
  • Hi Sgungormus, what do you mean with "you cannot change executor count on the fly" I m making differents proyects with hist own Sparksession. – Enrique Benito Casado Feb 07 '19 at 12:04
  • 1
    @Enrique I don't know your exact confuguration but once SparkSession is initialized executor counts cannot be changed via setting conf parameters. Those parameters should be set in the spark-submit or spark-shell command. – sgungormus Feb 07 '19 at 15:23