3

I am testing my first Spark Streaming pipline which processes messages from Kafka. However, after several testing runs, I got the following error message There is insufficient memory for the Java Runtime Environment to continue.

My testing data is really small thus this should not happen. After looking into the process, I realized maybe previously submitted spark jobs were not removed completely? enter image description here

I usually submit jobs like below, and I am using Spark 2.2.1 /usr/local/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 ~/script/to/spark_streaming.py

And stop it using `Ctrl+C'

Last few lines of the script looks like:

ssc.start()
ssc.awaitTermination()

Update

After I changing the way to submit a spark streaming job (command like below), I still ran into same issue which is after killing the job, memory will not be released.I only started Hadoop and Spark for those 4 EC2 nodes.

/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 --py-files ~/config.py --master spark://<master_IP>:7077 --deploy-mode client  ~/spark_kafka.py
Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
TTT
  • 4,354
  • 13
  • 73
  • 123

2 Answers2

2

When you press Ctrl-C, only the submitter process is interrupted, the job itself continues to run. Eventually your system runs out of memory so no new JVM can be started.

Furthermore, even if you restart the cluster, all previously running jobs will be restarted again.

Read how to stop a running Spark application properly.

rustyx
  • 80,671
  • 25
  • 200
  • 267
  • thanks for the suggestion. I tried both methods `/usr/local/spark/bin/spark-class org.apache.spark.deploy.Client kill ` and sending post request to `":6066/v1/submissions/kill/"`. However, both ways failed to release memory... Maybe because I used to wrong command to start a spark job using a cluster? This is what I used `/usr/local/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 ~/script/to/spark_streaming.py` – TTT May 08 '18 at 06:49
  • How do you know that memory isn't released? Did the commands succeed? Make sure there is nothing running: "`curl http://localhost:6066/json`" and/or [this](https://stackoverflow.com/questions/33495623/how-to-get-all-jobs-status-through-spark-rest-api). Also see [Spark REST API](http://spark.apache.org/docs/latest/monitoring.html#rest-api). – rustyx May 08 '18 at 07:55
  • I looked at `htop` and sort by memory usage to determine whether or not those spark processes have been killed. I restarted my cluster (on EC2) and those processes are gone... Maybe all of those are because I did not submit job correctly? I usually submit job on my master node. That means I should use `client mode`? or you suggest to submit through REST API? – TTT May 08 '18 at 08:03
  • 1
    @TH339 please describe your procedure how do run the app. My first suggestion is to check how the app works locally (Spark Master `local[*]`) – wind May 08 '18 at 08:12
  • @wind, I ran the job on an EC2 cluster with 1 master and 3 workers. – TTT May 08 '18 at 23:39
  • @rustyx, acturally I still ran into the same issue. Please see the update above. Appreciate your help! – TTT May 09 '18 at 00:25
  • @TH339 let's try `/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 --py-files ~/config.py --master local[*] --deploy-mode client ~/spark_kafka.py` – wind May 09 '18 at 05:03
  • @wind, thanks for the suggestion but `--master local[*]` only runs locally, right? mine setup is a 4-node EC2 cluster – TTT May 09 '18 at 09:10
  • It's only to check if the process works the same in a single-JVM environment. – wind May 09 '18 at 09:13
1

It might be the problem of bunch of driver (spark-app-driver process) processes running on the host you use to submit spark job. Try doing something like

ps aux --forest

or similar depending on your platform to understand what are the processes running at the moment. Or you can have a look at answers over the stackoverflow Spark Streaming with Actor Never Terminates , it might give you a glue on what is happening.

zubrabubra
  • 504
  • 3
  • 8