3

I'm running a single node application with Spark on a machine with 32 GB RAM. More than 12GB of the memory is available at the time I'm running the applicaton.

But From the spark UI and logs, I see that it using 3.8GB of RAM (which is gradually decreased as the jobs run).

At this time this is logged, 5GB more memory is avilable. Where as Spark is using 3.8GB

UPDATE

I set these parameters in conf/spark-env.sh but still each time I run the application It is using exactly 3.8 GB

export SPARK_WORKER_MEMORY=6g
export SPARK_MEM=6g
export SPARK_DAEMON_MEMORY=6g

Log

2015-11-19 13:05:41,701 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering MapOutputTracker

2015-11-19 13:05:41,716 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering BlockManagerMaster

2015-11-19 13:05:41,735 INFO org.apache.spark.storage.DiskBlockManager.logInfo:59 - Created local directory at /usr/local/TC_SPARCDC_COM/temp/blockmgr-8513cd3b-ac03-4c0a-b291-65aba4cbc395

2015-11-19 13:05:41,746 INFO org.apache.spark.storage.MemoryStore.logInfo:59 - MemoryStore started with capacity 3.8 GB

2015-11-19 13:05:41,777 INFO org.apache.spark.HttpFileServer.logInfo:59 - HTTP File server directory is /usr/local/TC_SPARCDC_COM/temp/spark-b86380c2-4cbd-43d6-a3b7-aa03d9a05a84/httpd-ceaffbd0-eac4-447e-9d3f-c452627a28cb

2015-11-19 13:05:41,781 INFO org.apache.spark.HttpServer.logInfo:59 - Starting HTTP Server

2015-11-19 13:05:41,842 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT

2015-11-19 13:05:41,854 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started SocketConnector@0.0.0.0:5279

2015-11-19 13:05:41,855 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'HTTP file server' on port 5279.

2015-11-19 13:05:41,867 INFO org.apache.spark.SparkEnv.logInfo:59 - Registering OutputCommitCoordinator

2015-11-19 13:05:42,013 INFO org.spark-project.jetty.server.Server.doStart:272 - jetty-8.y.z-SNAPSHOT

2015-11-19 13:05:42,039 INFO org.spark-project.jetty.server.AbstractConnector.doStart:338 - Started SelectChannelConnector@0.0.0.0:4040

2015-11-19 13:05:42,039 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'SparkUI' on port 4040.

2015-11-19 13:05:42,041 INFO org.apache.spark.ui.SparkUI.logInfo:59 - Started SparkUI at http://103.252.184.181:4040

2015-11-19 13:05:42,114 WARN org.apache.spark.metrics.MetricsSystem.logWarning:71 - Using default name DAGScheduler for source because spark.app.id is not set.

2015-11-19 13:05:42,117 INFO org.apache.spark.executor.Executor.logInfo:59 - Starting executor ID driver on host localhost

2015-11-19 13:05:42,307 INFO org.apache.spark.util.Utils.logInfo:59 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 31334.

2015-11-19 13:05:42,308 INFO org.apache.spark.network.netty.NettyBlockTransferService.logInfo:59 - Server created on 31334

2015-11-19 13:05:42,309 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Trying to register BlockManager

2015-11-19 13:05:42,312 INFO org.apache.spark.storage.BlockManagerMasterEndpoint.logInfo:59 - Registering block manager localhost:31334 with 3.8 GB RAM, BlockManagerId(driver, localhost, 31334)

2015-11-19 13:05:42,313 INFO org.apache.spark.storage.BlockManagerMaster.logInfo:59 - Registered BlockManager
Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Anil
  • 618
  • 1
  • 7
  • 13
  • Did you try to set executor-memory flag when running spar shell? https://stackoverflow.com/questions/24242060/how-to-change-memory-per-node-for-apache-spark-worker – serge_k Nov 19 '15 at 07:34
  • I'm running a java-maven web project and not sure whether I can set those parameters – Anil Nov 19 '15 at 07:45

2 Answers2

2

If you are using SparkSubmit you can use the --executor-memory and --driver-memory flags. Otherwise, change these configurations spark.executor.memory and spark.driver.memory either directly in your program or in spark-defaults.

Note that you should not set memory too high. As a rule of thumb, aim for ~75% of available memory. That will leave enough memory for other processes (like your OS) running on your machines.

Glennie Helles Sindholt
  • 12,816
  • 5
  • 44
  • 50
  • I'm running a java-maven web project and not sure whether I can set those parameters – Anil Nov 19 '15 at 07:44
  • Have a look here: http://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkConf.html#set(java.lang.String,%20java.lang.String) – Tobi Nov 19 '15 at 08:00
  • @Glennie, Can I set the both the values to 10G if i've 10GB available?? – Anil Nov 19 '15 at 08:26
  • @Anil Usually, your driver needs much less memory than your executors. If you are running on a single machine, then you need to divide the memory between driver and executor, like `--driver-memory 2G` and `--executor-memory 8G`. If you are running on a cluster in standalone mode, the driver is run on your master while your executors are run on your slaves and you could give them each 10GB of memory (provided of course that the machines have ~15GB of total memory). – Glennie Helles Sindholt Nov 19 '15 at 10:46
  • 1
    In latest `Spark` versions, one can also provide configurations with `spark-submit` like this: `spark-submit --conf spark.executor.memory=10g` as told [here](https://stackoverflow.com/a/49421101/3679900) – y2k-shubham Mar 22 '18 at 13:14
  • Also as told [here](https://stackoverflow.com/a/37871195/3679900), I believe `25%` memory for *OS & `Hadoop` daemons etc.* is a little too much. About ~ `1 GB` should be sufficient for these things – y2k-shubham Mar 22 '18 at 13:22
2

It is correctly stated by @Glennie Helles Sindholt but setting driver flags while submitting jobs on a standalone machine won't affect the usage as the JVM has been already been initialized. Checkout this link of discussion:

How to set Apache Spark Executor memory

If you are using Spark submit command to submit a job following is an example for how to set parameters while submitting the job:

spark-submit --master spark://127.0.0.1:7077 \
             --num-executors 2 \
             --executor-cores 8 \
             --executor-memory 3g \
             --class <Class name> \
             $JAR_FILE_NAME or path \
             /path-to-input \
             /path-to-output \

By varying the number of parameters in this you can see and understand how the usage of RAM is changing. Also, there is a utility named htop on Linux. It is useful to instantaneous usage of memory, CPU cores and Swap space to have an understanding of what is happening. To install htop, use the following:

sudo apt-get install htop

It will look something like this: htop utility

For more information you can check out the following links:

https://spark.apache.org/docs/latest/configuration.html

  • As pointed out by **@Jacek Laskowski** [here](https://stackoverflow.com/questions/32621990/what-are-workers-executors-cores-in-spark-standalone-cluster#comment53178300_32628057), `Spark` no longer uses `--num-executors` with `--master yarn` mode. The alternative is [`spark.dynamicAllocation.initialExecutors`](https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation), but that is available only when [*Dynamic Allocation*](https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation) is enabled – y2k-shubham Mar 22 '18 at 13:28
  • @y2k-shubham But what if you want to run a jar file from one of the benchmarks and you want to state the number of executors while submitting the job to spark-submit because you cannot or don't want to edit the file. As far I know, this is the only option to submit a job by changing the executors. Correct me if I am wrong. – devangmotwani Mar 24 '18 at 05:09
  • **@devangmotwani** try using `spark.dynamicAllocation.initialExecutors` and provide it as `--conf` to `spark-submit` command as hinted [here](https://stackoverflow.com/a/49421101/3679900) – y2k-shubham Mar 24 '18 at 06:08