0

I have a partitioned table(on col1) in hive which is also bucketed(on col2 in 16 buckets), now if i want to run a select query how many mapper and reducers task will be spawned?

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • How many HDFS data files are present in the partitions/buckets in scope for your WHERE clause? How many HDFS blocks in these files? Or, when using a complex columnar format like ORC / Parquet, how many stripes/whatever in the files? These are the unit of parallelism for Mappers. As for Reducers, well, it depends!! – Samson Scharfrichter Apr 01 '17 at 11:42

1 Answers1

0

For every input split of the input table one mapper will be dispatched, where the default size of the input split will the block size.

You could alter the number of mappers by modifying the mapreduce.input.fileinputformat.split.maxsize and mapreduce.input.fileinputformat.split.minsize properties.

Speaking about the number of reducers in Hive, by default it is calculated using the hive.exec.reducers.bytes.per.reducer property where its default value is 1GB.

You will be able to configure the number of reducers by modifying the above property. Still you may also set the constant number of reducers for a job by using the mapred.reduce.tasks property.

You can find more details on the following links

How hadoop decides how many nodes will do map and reduce tasks

Aditya Agarwal
  • 693
  • 1
  • 10
  • 17