4

i am using Apache Spark(standalone mode) ALS recommendation algorithm with 60 GB of data , the problem is the CPU Spike to 100% when algorithms starts how can i set limitation CPU usage in spark ? for example just to use 50% of CPU.

i have tried with less CPU cores , but it doesn't change anything regarding to CPU Usage.

i am running spark in standalone mode in a server with following configuration :

#System information :
OS Name:                   Microsoft Windows Server 2016 Standard
OS Version:                10.0.14393 N/A Build 14393
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
Cores:                     6
Total Physical Memory:     262,030 MB
Available Physical Memory: 178,164 MB
Virtual Memory: Max Size:  300,942 MB
Virtual Memory: Available: 215,377 MB

#Spark 
version 2.4.3

#Java
java version "10.0.2" 2018-07-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.2+13)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)

and i have setup my spark session with following configs:

spark_session = SparkSession \
    .builder \
    .appName("ALSRecommendation") \
    .config("spark.driver.memory","60G")\
    .config('spark.cores.max', 5)\
    .config("spark.driver.cores",5)\
    .getOrCreate();
Arash
  • 3,458
  • 7
  • 32
  • 50

3 Answers3

3

You don't seem to be running in standalone mode (it is actually a clustered mode) but in local mode i.e single JVM.

To manage the number of cores used in local mode, you need to set the master to "local[max_number_of_cores]".

So in your case, this should work as expected:

spark_session = SparkSession \
    .builder \
    .appName("ALSRecommendation") \
    .master("local[5]") \
    .config("spark.driver.memory","60G")\
    .getOrCreate();
rluta
  • 6,717
  • 1
  • 19
  • 21
0

In case of CPU, spark.executor.cores is the amount of concurrent tasks executor can run. More information on Spark Configuration Documentation.

spark.executor.cores : 1 in YARN mode, all the available cores on the worker in standalone and Mesos coarse-grained modes.

EDIT: Well, in a standalone cluster Spark only manages application predefined resource configs with provided resource pool. (https://spark.apache.org/docs/latest/spark-standalone.html)

Also see this: How to tune spark executor number, cores and executor memory?

John Smith
  • 465
  • 4
  • 15
  • 38
  • since i am running it in standalone mode , there is no need to set spark.executor.cores , also i have tried with that . still spike. – Arash Jul 21 '19 at 07:19
0

You might be able to limit spark's core usage with cgroups but I don't think you'd want to get into that.

Are you running in cluster deploy mode?, because 'spark.driver.cores' only takes effect when using cluster deploy mode.

Try to explicitly set the amount of cores for the driver and executor:

(spark.executor.cores=3)
(spark.driver.cores=2)

and get rid of the 'spark.core.max' setting.

Assuming you're only using this host, you should end up with 1 free CPU. this doesn't solve your issue like you wanted but that's just how spark works.