To speed up hive process, how to adjust mapper and reducer number using tez

Question

I tried the process(word labeling of sentence) of large data(about 150GB) using tez , but the problem is that it took so much time(1week or more),then

I tried to specify number of mapper. Though I set mapred.map.tasks =2000, but I can't stop mapper being set to about 150, so I can't do what I want to do.

I specify the map value in oozie workflow file and use the tez.

How can I specify the number of mapper?

Finally I want to speed up the process, it is ok not to use tez.

In addition, I would like to count labeled sentence by reducer, it takes so much time,too.

And , I also want to know how I adjust memory size to use each mapper and reducer process.

`mapred.map.tasks` doesn't do anything for Tez because it's not MR Hive engine being ran. Plus, that property is deprecated — OneCricketeer, Aug 25 '18 at 03:55
See this answer please: https://stackoverflow.com/a/42842117/2700344 — leftjoin, Aug 25 '18 at 06:00

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

1

In order to manually set the number of mappers in a Hive query when TEZ is the execution engine the configuration tez.grouping.split-count can be used...

... set tez.grouping.split-count=4 will create 4 mappers

https://community.pivotal.io/s/article/How-to-manually-set-the-number-of-mappers-in-a-TEZ-Hive-job

However, overall, you should optimize the storage format and the Hive partitions before you even begin tuning the Tez settings. Do not try and process data STORED AS TEXT in Hive. Convert it to ORC or Parquet first.

If Tez isn't working out for you, you can always try Spark. Plus labelling sentences is probably a Spark MLlib worlflow you can find somewhere

edited Jun 20 '20 at 09:12

Community

1
1

answered Aug 25 '18 at 03:59

OneCricketeer

179,855
19
132
245

Do you know how to adjust number of reducer and allocated memory size of it? – Keito Tanki Aug 25 '18 at 11:01
1

`mapreduce.job.reduces`, and same memory settings that control the mapper container should control the reducer – OneCricketeer Aug 25 '18 at 16:11

To speed up hive process, how to adjust mapper and reducer number using tez

1 Answers1