I am testing my first Spark Streaming
pipline which processes messages from Kafka
. However, after several testing runs, I got the following error message
There is insufficient memory for the Java Runtime Environment to continue.
My testing data is really small thus this should not happen. After looking into the process
, I realized maybe previously submitted spark jobs were not removed completely?
I usually submit jobs like below, and I am using Spark 2.2.1
/usr/local/spark/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 ~/script/to/spark_streaming.py
And stop it using `Ctrl+C'
Last few lines of the script looks like:
ssc.start()
ssc.awaitTermination()
Update
After I changing the way to submit a spark streaming job (command like below), I still ran into same issue which is after killing the job, memory will not be released.I only started Hadoop
and Spark
for those 4 EC2 nodes.
/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.2 --py-files ~/config.py --master spark://<master_IP>:7077 --deploy-mode client ~/spark_kafka.py