I start the sparkling-shell with the following command.
./bin/sparkling-shell --num-executors 4 --executor-memory 4g --master yarn-client
I only ever get two executors. Is this an H2o problem, YARN problem, or Spark problem?
Mike
I start the sparkling-shell with the following command.
./bin/sparkling-shell --num-executors 4 --executor-memory 4g --master yarn-client
I only ever get two executors. Is this an H2o problem, YARN problem, or Spark problem?
Mike
There can be multiple reasons for this behaviour.
YARN can give you only the amount of executors based on available resources ( memory, vcores ). If you ask for more then you have resources, it will give you max what it can.
It can be also case when you have dynamic allocation enabled. This means that that Spark will create new executors when they are needed.
In order to solve some technicalities in Sparkling Water we need to discover all available executors at the start of the application by creating artificial computation and trying to utilise the whole cluster. This might give you less number of executors as well.
I would suggest looking at https://github.com/h2oai/sparkling-water/blob/master/doc/tutorials/backends.rst where you can read more about the paragraph above and how it can be solved using so called external sparkling water backend.
You can also have a look here https://github.com/h2oai/sparkling-water/blob/master/doc/configuration/internal_backend_tuning.rst. This is Sparkling Water guide for tuning the configuration.
Kuba
I got over the problem by changing the following four values in cloudera manager
Setting Value
yarn.scheduler.maximum-allocation-vcores 8
yarn.nodemanager.resource.cpu-vcores 4
yarn.nodemanager.resource.cpu-vcores 4
yarn.scheduler.maximum-allocation-mb 16 GB