Limiting Apache Spark CPU Usage

Question

i am using Apache Spark(standalone mode) ALS recommendation algorithm with 60 GB of data , the problem is the CPU Spike to 100% when algorithms starts how can i set limitation CPU usage in spark ? for example just to use 50% of CPU.

i have tried with less CPU cores , but it doesn't change anything regarding to CPU Usage.

i am running spark in standalone mode in a server with following configuration :

#System information :
OS Name:                   Microsoft Windows Server 2016 Standard
OS Version:                10.0.14393 N/A Build 14393
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
Cores:                     6
Total Physical Memory:     262,030 MB
Available Physical Memory: 178,164 MB
Virtual Memory: Max Size:  300,942 MB
Virtual Memory: Available: 215,377 MB

#Spark 
version 2.4.3

#Java
java version "10.0.2" 2018-07-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.2+13)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.2+13, mixed mode)

and i have setup my spark session with following configs:

spark_session = SparkSession \
    .builder \
    .appName("ALSRecommendation") \
    .config("spark.driver.memory","60G")\
    .config('spark.cores.max', 5)\
    .config("spark.driver.cores",5)\
    .getOrCreate();

Maybe this can help: https://spark.apache.org/docs/latest/tuning.html — John Smith, Jul 21 '19 at 06:52
@JohnSmith , i already studied that , it doesn't help . it is about how to set CPU Core , not how to limit it . — Arash, Jul 21 '19 at 06:53
Arash, `spark.cores.max` is a `limit` option. You are limiting it to 5 cores in this context. Please expand on your issue. — John Smith, Jul 21 '19 at 06:58
@JohnSmith , no i am running it in stand alone mode , for testing purpose. — Arash, Jul 21 '19 at 07:07
I hope this thread can help you https://stackoverflow.com/a/37871195/4270698 — Kumar KS, Sep 10 '19 at 05:04

score 3 · Answer 1 · answered Jul 21 '19 at 09:20

You don't seem to be running in standalone mode (it is actually a clustered mode) but in local mode i.e single JVM.

To manage the number of cores used in local mode, you need to set the master to "local[max_number_of_cores]".

So in your case, this should work as expected:

spark_session = SparkSession \
    .builder \
    .appName("ALSRecommendation") \
    .master("local[5]") \
    .config("spark.driver.memory","60G")\
    .getOrCreate();

John Smith · Answer 2 · 2019-07-21T07:39:20.627

0

In case of CPU, spark.executor.cores is the amount of concurrent tasks executor can run. More information on Spark Configuration Documentation.

spark.executor.cores : 1 in YARN mode, all the available cores on the worker in standalone and Mesos coarse-grained modes.

EDIT: Well, in a standalone cluster Spark only manages application predefined resource configs with provided resource pool. (https://spark.apache.org/docs/latest/spark-standalone.html)

Also see this: How to tune spark executor number, cores and executor memory?

edited Jul 21 '19 at 07:39

answered Jul 21 '19 at 07:09

John Smith

465
4
15
38

since i am running it in standalone mode , there is no need to set spark.executor.cores , also i have tried with that . still spike. – Arash Jul 21 '19 at 07:19

score 0 · Answer 3 · answered Jul 21 '19 at 07:22

You might be able to limit spark's core usage with cgroups but I don't think you'd want to get into that.

Are you running in cluster deploy mode?, because 'spark.driver.cores' only takes effect when using cluster deploy mode.

Try to explicitly set the amount of cores for the driver and executor:

(spark.executor.cores=3)
(spark.driver.cores=2)

and get rid of the 'spark.core.max' setting.

Assuming you're only using this host, you should end up with 1 free CPU. this doesn't solve your issue like you wanted but that's just how spark works.

Limiting Apache Spark CPU Usage

3 Answers3