I tried the process(word labeling of sentence) of large data(about 150GB) using tez , but the problem is that it took so much time(1week or more),then
I tried to specify number of mapper. Though I set mapred.map.tasks =2000, but I can't stop mapper being set to about 150, so I can't do what I want to do.
I specify the map value in oozie workflow file and use the tez.
How can I specify the number of mapper?
Finally I want to speed up the process, it is ok not to use tez.
In addition, I would like to count labeled sentence by reducer, it takes so much time,too.
And , I also want to know how I adjust memory size to use each mapper and reducer process.