Force hadoop to set no of map task to 1

Question

I think my question get confused to everyone.Making little more clear. I am trying to order my data. say my data(few records) is like this

0 1 2 3 4
1 3 8 9 2
2 8 7 9 7

and my block size is 128 MB and file size is 380 Mb(3 blocks) I am trying to give an order number to my records.

1,0 1 2 3 4
2,1 3 8 9 2
3,2 8 7 9 7

For giving the correct number I need to get data into 1 map else if I get 3 map tasks my numbering wont be correct.

So if I am doing so I will get whole data as it is right? No changes will be happened to the data that get entered to my mapper class, it will be my original data,is'nt it?

And once I set no of mappers to 1 using

 -D mapreduce.job.maps=1

or

conf.setInt("mapreduce.job.running.map.limit", 1);

my output generates 3 part-m-000* files

I am using Hadoop 2.6.0-cdh5.4.7 cloudera version.

Am I doing anything wrong? Please advice

So, you are saying even after setting mapreduce.job.maps=1, you are getting 3 mappers? Is that your concern? — Manjunath Ballur, Jan 05 '16 at 16:34

score 1 · Answer 1 · answered Jan 06 '16 at 05:31

1

Number of mappers
```
-Dmapreduce.job.maps=1
```
This can be used for specifying the default number of mapper tasks per job.

But, when you submit the job, the JobSubmitter overrides this parameter, based on the number of splits:
```
LOG.debug("Creating splits at " + jtFs.makeQualified(submitJobDir));
int maps = writeSplits(job, submitJobDir);
conf.setInt(MRJobConfig.NUM_MAPS, maps);
```
In the code above, MRJobConfig.NUM_MAPS is:
```
public static final String NUM_MAPS = "mapreduce.job.maps";
```
and it gets set to number of splits, computed by writeSplits() method.

Hence, your setting does not take effect.
Mapper limit
```
conf.setInt("mapreduce.job.running.map.limit", 1);
```
This setting just controls the maximum number of simultaneous mappers.

answered Jan 06 '16 at 05:31

Manjunath Ballur

6,287
3
37
48

What are the other ways to set mappers to 1 – USB Jan 06 '16 at 05:33
I will check and confirm. I need to check the Hadoop code. I will update the answer, once I find something – Manjunath Ballur Jan 06 '16 at 05:35
I don't think you can change it. Even `Job` object has `setNumReduceTasks()` method. But, it does not have `setNumMapTasks()` method. – Manjunath Ballur Jan 06 '16 at 09:03

score 0 · Answer 2 · answered Jan 05 '16 at 11:45

0

If you wanna sort your data its important that reduce is part of your job. If you wanna have n sorted files, then plain reduce will do, if you wanna have a single output file then you need to set the number of reducers to 1 (similar to what you did for map).

Setting the number of mappers to 1 has no impact on what you're trying to achieve other then slowing the job down!

answered Jan 05 '16 at 11:45

oae

1,513
1
17
23

My intension is not sorting of data. I need to give order number for my data – USB Jan 06 '16 at 04:44
Ok, i see. Did you also turn off reduce ? Think you need to set the number of reducers to 0: conf.setNumReduceTasks(0) – oae Jan 07 '16 at 09:30
Ok, you cannot really set the number of maps this way, because it depends on how many splits your InputFormat is creating for the job. If it creates 3 splits then its 3 tasks, so usually InputFormats take the configured number of mappers as a hint, but there is no guarantee for that. So if your really wanna force an map-task count of one have a look at the InputFormats and their options. There should be something like CombinedInputFormat as well. However, question is if using Hadoop for that task is still beneficial , because you removing all parallelism! – oae Jan 11 '16 at 07:47
Yes u r right. But I wanted to try Matrix multiplication.With plain 2 large matrices we cannot do matrix multiplication as we cannot gaurentee the computation as data will not be of same order(as it comes from different splits). For for doing that I was trying to add row column dimesion to my data [hint](http://magpiehall.com/two-step-matrix-multiplication-with-hadoop/) – USB Jan 12 '16 at 05:10

score 0 · Answer 3 · edited May 23 '17 at 12:04

0

Instead of setting number of mappers to 1, solve the problem in different way by using Secondary Sorting at Mapper end.

With a slight manipulation to the format of the key object, secondary sorting gives us the ability to take the value into account during the sort phase.

Have a look at this article for working code example in java.

Have a look at this question too : hadoop map reduce secondary sorting

If you still need only one Map task and your paramaters are getting ignored by framework, go for non-splittable hadoop compression file types like gzip ( For uncompressed data size less than 1 GB)

Have a look at this presentation for more details.

edited May 23 '17 at 12:04

Community

1
1

answered Jan 05 '16 at 14:53

Ravindra babu

37,698
11
250
211

But still that is the case of sorting up the values right? My intension is I need to get entire data in map. – USB Jan 06 '16 at 05:01
What is the size of data set? Use non splitable compressed file. In this case, only one mapper process the data. – Ravindra babu Jan 06 '16 at 05:17
Size is about 380 MB – USB Jan 06 '16 at 05:21
Good enough for compression so that one mapper will process it. – Ravindra babu Jan 06 '16 at 05:22
Have a look at example for gzip fomat : http://www.javased.com/index.php?api=org.apache.hadoop.io.compress.GzipCodec – Ravindra babu Jan 06 '16 at 05:40
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/99880/discussion-between-unmesha-sreeveni-and-ravindra). – USB Jan 06 '16 at 05:43

score 0 · Answer 4 · answered Jan 06 '16 at 00:24

Description of mapreduce.job.maps here states

Ignored when mapreduce.jobtracker.address is "local"

So, if you are running in your local machine, that may explain why you have 3 mappers.

Coming to sorting, a map method where the application code is written works on a single input . So, if you want the sort happen map phase it gets complicated. On the other hand, it is almost straight forward if you do the sort in reduce side.

Force hadoop to set no of map task to 1

4 Answers4