18

I am running a Spark job using Scala, but it gets stuck not executing and tasks by my worker nodes.

Currently I am submitting this to Livy, which submits to our Spark Cluster with 8 cores and 12GB of RAM with the following configuration:

data={
    'file': bar_jar.format(bucket_name),
    'className': 'com.bar.me',
    'jars': [
        common_jar.format(bucket_name),
    ],
    'args': [
        bucket_name,
        spark_master,
        data_folder
    ],
    'name': 'Foo',
    'driverMemory': '2g',
    'executorMemory': '9g',
    'driverCores': 1,
    'executorCores': 1,
    'conf': {
        'spark.driver.memoryOverhead': '200',
        'spark.executor.memoryOverhead': '200',
        'spark.submit.deployMode': 'cluster'
    }
}

The node logs then are endlessly filled with:

2019-03-29T22:24:32.119+0000: [GC (Allocation Failure) 2019-03-29T22:24:32.119+0000:
[ParNew: 68873K->20K(77440K), 0.0012329 secs] 257311K->188458K(349944K), 
0.0012892 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]

The issue is that the next stages & tasks are not executing, so the behavior is quite unexpected. Tasks won't run

Willem van Ketwich
  • 5,666
  • 7
  • 49
  • 57
Eric Meadows
  • 887
  • 1
  • 11
  • 19
  • 7
    you are allocating resources more than available. Available is 12GB but you're allocating 2+9 = 11 GB and overhead as well. So yarn is suffocating. Please try reducing executorMemory eg 5g – maogautam Mar 30 '19 at 07:26

2 Answers2

6

It is apparently a normal GC event:

This ‘Allocation failure’ log is not an error but is a totally normal case in JVM. This is a typical GC event which causes the Java Garbage Collection process to get triggered. Garbage Collection removes dead objects, compact reclaimed memory and thus helps in freeing up memory for new object allocations.

Source: https://medium.com/@technospace/gc-allocation-failures-42c68e8e5e04

Edit: If the next stages are not executing, maybe you should check stderr instead of stdout.

Joy Yeh
  • 61
  • 1
  • 2
4

The following link provides a description on how to allocate executor memory

https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/

I found it very useful , but found that the following parameters

  1. spark.default.parallelism
  2. spark.sql.shuffle.partitions

needs to be updated as per our application requirements

Sabarish Sathasivan
  • 1,196
  • 2
  • 19
  • 42