3

I am trying to run the wordcount example under Amazon EMR.

-1- First, I create a cluster with the following command:

./elastic-mapreduce --create --name "MyTest" --alive

This creates a cluster with a single instance and returns a jobID, lets say j-12NWUOKABCDEF

-2- Second, I start a Job using the following command:

./elastic-mapreduce --jobflow j-12NWUOKABCDEF --jar s3n://mybucket/jar-files/wordcount.jar --main-class abc.WordCount --arg s3n://mybucket/input-data/

--arg s3n://mybucket/output-data/

--arg -Dmapred.reduce.tasks=3

My WordCount class belongs to the package abc.

This executes without any problem, but I am getting only one reducer. Which means that the parameter "mapred.reduce.tasks=3" is ignored.

Is there any way to specify the number of reducers that I want my application to use ?

Thank you, Neeraj.

3 Answers3

2

The "-D" and the "mapred.reduce.tasks=3" should be separate arguments.

Judge Mental
  • 5,209
  • 17
  • 22
0

Try to launch the EMR cluster by setting reducers and mapper with --bootstrap-action option as

--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-daemons --args "-m,mapred.map.tasks=6,-m,mapred.reduce.tasks=3"
mat_vee
  • 85
  • 1
  • 10
-1

You can use the streaming Jar's built-in option of -numReduceTasks. For example with the Ruby EMR CLI tool:

elastic-mapreduce --create --enable-debugging \
  --ami-version "3.3.1" \
  --log-uri s3n://someBucket/logs \
  --name "someJob" \
  --num-instances 6 \
  --master-instance-type "m3.xlarge"  --slave-instance-type "c3.8xlarge" \
  --bootstrap-action s3://elasticmapreduce/bootstrap-actions/install-ganglia \
  --stream \
    --arg "-files" \
    --arg "s3://someBucket/some_job.py,s3://someBucket/some_file.txt" \
    --mapper "python27 some_job.py some_file.txt" \
    --reducer cat \
    --args "-numReduceTasks,8" \
    --input s3://someBucket/myInput \
    --output s3://someBucket/myOutput \
    --step-name "main processing"
Dolan Antenucci
  • 15,432
  • 17
  • 74
  • 100
  • This is a feature built into Hadoop (see https://wiki.apache.org/hadoop/HadoopStreaming), so double check that your command matches what I have (e.g., usage of `--args` instead of `--arg` is important). – Dolan Antenucci Mar 23 '15 at 20:09