11

For a python Hadoop streaming job, how do I pass a parameter to, for example, the reducer script so that it behaves different based on the parameter being passed in?

I understand that streaming jobs are called in the format of:

hadoop jar hadoop-streaming.jar -input -output -mapper mapper.py -reducer reducer.py ...

I want to affect reducer.py.

zzztimbo
  • 2,293
  • 4
  • 28
  • 31

4 Answers4

18

The argument to the command line option -reducer can be any command, so you can try:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input inputDirs \
    -output outputDir \
    -mapper myMapper.py \
    -reducer 'myReducer.py 1 2 3' \
    -file myMapper.py \
    -file myReducer.py

assuming myReducer.py is made executable. Disclaimer: I have not tried it, but I have passed similar complex strings to -mapper and -reducer before.

That said, have you tried the

-cmdenv name=value

option, and just have your Python reducer get its value from the environment? It's just another way to do things.

Ray Toal
  • 86,166
  • 18
  • 182
  • 232
2

In your Python code,

import os
(...)
os.environ["PARAM_OPT"]

In your Hapdoop command include:

hadoop jar \
(...)
-cmdenv PARAM_OPT=value\
(...)
luismartingil
  • 1,029
  • 11
  • 16
2

You can -reducer as the below command

hadoop jar hadoop-streaming.jar \
-mapper 'count_mapper.py arg1 arg2' -file count_mapper.py \
-reducer 'count_reducer.py arg3' -file count_reducer.py \

you can revise this Link

Mohamed El-Touny
  • 347
  • 1
  • 4
  • 14
1

If you are using python you may want to check out dumbo which provides a nice wrapper around hadoop streaming. In dumbo you pass parameters with -param as in :

dumbo start yourpython.py -hadoop <hadoop-path> -input <input> -output <output>  -param <parameter>=<value>

And then read it in the reducer

def reducer:
def __init__(self):
    self.parmeter = int(self.params["<parameter>"])
def __call__(self, key, values):
    do something interesting ...

You can read more in the dumbo tutorial

Arnon Rotem-Gal-Oz
  • 25,469
  • 3
  • 45
  • 68