I have a 9 node Hadoop cluster, running MapR with the following Yarn roles:
- 1 node is a Yarn ResourceManager
- 8 of the nodes are Yarn NodeManagers (two of which are also backup ResourceManagers)
Whenever I submit a Spark job via pyspark I see something like the following:
yarn node -list
17/04/26 23:40:26 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to hadoop-n1/XXX.XXX.XX.XX:8032
Total Nodes:8
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop-d8:52118 RUNNING hadoop-d8:8042 4
hadoop-d5:39108 RUNNING hadoop-d5:8042 0
hadoop-d2:42471 RUNNING hadoop-d2:8042 0
hadoop-d4:36191 RUNNING hadoop-d4:8042 0
hadoop-d3:53476 RUNNING hadoop-d3:8042 0
hadoop-d6:52497 RUNNING hadoop-d6:8042 4
hadoop-d1:59887 RUNNING hadoop-d1:8042 0
hadoop-d7:51878 RUNNING hadoop-d7:8042 4
So d6
, d7
and d8
are always used (and always with 4 containers) and the other nodes are never used. I've set spark.dynamicAllocation.enabled
to True
which made no difference. I've tried restarting all Yarn processes, again with no luck. How can I get my jobs to run on all nodes?
I am running Hadoop version 2.7.0, Spark version 2.0.1 and MapR version 5.2 on Ubuntu 14.04. The Yarn resource calculator is set to DiskBasedResourceCalculator
which uses uses Memory, CPU and Disk.
EDIT: Based on this answer suggested in the comments I changed to the DominantResourceCalculator
and set spark.executor.instances
to 0. Still no changes. If I look at one of the unused nodes I see:
% yarn node -status hadoop-d7:35731
17/04/27 16:01:19 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to hadoop-n1/ Node Report :
Node-Id : hadoop-d7:35731
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : hadoop-d7:8042
Last-Health-Update : Thu 27/Apr/17 03:59:29:475EDT
Health-Report :
Containers : 4
Memory-Used : 32768MB
Memory-Capacity : 97988MB
CPU-Used : 20 vcores
CPU-Capacity : 20 vcores
Node-Labels :
whereas the good nodes show
% yarn node -status hadoop-d2:37314
17/04/27 16:01:05 INFO client.MapRZKBasedRMFailoverProxyProvider: Updated RM address to hadoop-n1/
Node Report :
Node-Id : hadoop-d2:37314
Rack : /default-rack
Node-State : RUNNING
Node-Http-Address : hadoop-d2:8042
Last-Health-Update : Thu 27/Apr/17 03:59:32:255EDT
Health-Report :
Containers : 0
Memory-Used : 0MB
Memory-Capacity : 70189MB
CPU-Used : 0 vcores
CPU-Capacity : 4 vcores
Node-Labels :