Spark not using all nodes after resizing

Question

I've got an EMR cluster that I'm trying to use to execute large text processing jobs, and I had it running on a smaller cluster, however after resizing the master keeps running the jobs locally and crashing due to memory issues.

This is the current configuration I have for my cluster:

[
    {
        "classification":"capacity-scheduler", 
        "properties":
            {
                "yarn.scheduler.capacity.resource-calculator":"org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
            },
        "configurations":[]
    },
    {
        "classification":"spark",
        "properties":
            {
                "maximizeResourceAllocation":"true"
            },
        "configurations":[]
    },
    {
        "classification":"spark-defaults", 
        "properties":
            {
                "spark.executor.instances":"0",
                "spark.dynamicAllocation.enabled":"true"
            },
        "configurations":[]
    }
]

This was a potential solution I saw from this question, and it did work before I resized.

Now whenever I attempt to submit a spark job like this spark-submit mytask.py I see tons of log entries where it doesn't seem to leave the master host, like so:

 17/08/14 23:49:23 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0,localhost, executor driver, partition 0, PROCESS_LOCAL, 405141 bytes)

I've tried different parameters, like setting --deploy-mode cluster and --master yarn, since yarn is running on the master node, but still seeing all the work being done by the master host, while the core nodes sit idle.

Is there another configuration I'm missing, preferably one that doesn't require rebuilding the cluster?

@GlennieHellesSindholt I overwrite that when I do spark-submit --num-executors 2, correct? — CBredlow, Aug 18 '17 at 16:58
@GlennieHellesSindholt whenever I set that to 2 the cluster fails to start — CBredlow, Aug 18 '17 at 18:52
Why do you specify it at all? If you specify number of executors in spark-submit, there is no reason to put it in the configs. — Glennie Helles Sindholt, Aug 19 '17 at 18:22

Spark not using all nodes after resizing

0 Answers0