6

I am running multiple queries on the hive. I have a Hadoop cluster with 6 nodes. Total vcores in the cluster is 21.

I need only 2 cores to be allocated to a python process so that the rest of the available cores will be used by another main process.

Code

from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"

conn = hive.Connection(host=hive_host_name, port=hive_port,username=hive_user, database=hive_database, configuration={})
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')
Vishnu
  • 11,614
  • 6
  • 51
  • 90
  • Your question title and text does not seem to be well aligned - are you asking how to limit the MR job resources or the driver (your python code)? – mazaneicha Nov 13 '19 at 15:56
  • @mazaneicha yes, total map and reduce resources should not exceed more than 2 combined – Vishnu Nov 13 '19 at 15:57

1 Answers1

3

Try passing following setting in the configuration map:

yarn.nodemanager.resource.cpu-vcores=2

Default value is 8 for this setting.

Description: Number of CPU cores that can be allocated for containers.

Your updated code will be like:

from pyhive import hive
hive_host_name = "subdomain.domain.com"
hive_port = 20000
hive_user = "user"
hive_password = "password"
hive_database = "database"
configuration = {
    "yarn.nodemanager.resource.cpu-vcores": 2
}

conn = hive.Connection( \
                       host=hive_host_name,
                       port=hive_port,
                       username=hive_user,
                       database=hive_database,
                       configuration=configuration
                      )
cursor = conn.cursor()
cursor.execute('select count(distinct field) from somedata')

Reference URL

Ambrish
  • 3,627
  • 2
  • 27
  • 42