3

I'm running a complex query in hive which, when ran, starts using a huge amount of local disk space in /tmp folder and eventually ends with a space error as the /tmp folder fills up completely with the intermediate map-reduce results because of the mentioned query (/tmp folder is created in a separate partition, having 100 GB of empty space). While running it says:

Execution completed successfully

MapredLocal task succeeded

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Job running in-process (local Hadoop)

As you can see above, Hive is somehow running in local mode. After doing some research over the net, I checked a few relevant parameters and below are the results:

hive> set hive.exec.mode.local.auto;

hive.exec.mode.local.auto=false

hive> set mapred.job.tracker;

mapred.job.tracker=local

hive> set mapred.local.dir;

mapred.local.dir=/tmp/hadoop-hive/mapred/local

So I have two questions regarding this:

  1. Can this be the reason why the map-reduce jobs are consuming space on local disk instead of hdfs /tmp folder, as is the case typically with pig scripts?
  2. How to make Hive run in distributed mode, given the current settings? Please mind that I'm using MRV2 in the cluster, but the above options are confusing as they seem to be relevant for MRV1. I can be wrong here, being a newbee.

Any help will be much appreciated!

user5092078
  • 51
  • 1
  • 5

1 Answers1

0

It turns out that I was missing out on the bare essentials. After setting HADOOP_MAPRED_HOME to /usr/lib/hadoop-mapreduce in all the nodes, all the issues were fixed.

user5092078
  • 51
  • 1
  • 5