Map Reduce: Why Do I Need to Specify "python" Before Piping to .py File?

Question

I am running map reduce locally.

My command line command looks like this:

cat testfile | python ./mapper.py | python ./reducer.py

and this works fine. However, when my command looks like this:

cat testfile | ./mapper.py | ./reducer.py

I get the following error:

./mapper.py: line 1: import: command not found
./mapper.py: line 3: syntax error near unexpected token `('
./mapper.py: line 3: `def mapper():

This makes sense since the command line is reading my python file as bash and getting confused by the pythonic syntax.

But all of the online examples I look at (e.g. http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/) don't include the python before the .py files. How can I configure my machine to run the pipe without specifying python before mapper.py and reducer.py?

Just in case it helps, here's my mapper code:

import sys

def mapper():
    for line in sys.stdin:
        data = line.strip().split('\t')
        if len(data) == 6:
            category = data[3]
            sales = data[4]
            print '{0}\t{1}'.format(category, sales)

if __name__ == "__main__":
    mapper()

here's my reducer code:

import sys

def reducer():
    current_total = 0
    old_key = None

    for line in sys.stdin:
        data = line.strip().split('\t')
        if len(data) == 2:
            current_key, sales = data
            sales = float(sales)

            if old_key and current_key != old_key:
                print "{0}\t{1}".format(old_key, current_total)
                current_total = 0
            old_key = current_key
            current_total += sales

    print "{0}\t{1}".format(current_key, current_total)

if __name__ == "__main__":
    reducer()

And my data looks like this:

2012-01-01      09:01   Anchorage       DVDs    6.38    Amex
2012-01-01      09:01   Aurora    Electronics    117.81  MasterCard
2012-01-01      09:01   Philadelphia    DVDs    351.31  Cash

add a hashbang line at the beginning of your python script `#!/usr/bin/env python` — 0.sh, Nov 16 '16 at 00:11

score 2 · Accepted Answer · edited May 23 '17 at 12:32

Because your file doesn't know the iterpreter for it. You are specifying it explicitly using python ./myfile. If you do not want to define it explicitly. You can mention shebang at the first line of file, which is basically the path to interpreter. For Python, shebang is like:

#!/usr/bin/env python

or

#!/usr/local/bin/python

For more information, read:

As per the shebang wiki:

Under Unix-like operating systems, when a script with a shebang is run as a program, the program loader parses the rest of the script's initial line as an interpreter directive; the specified interpreter program is run instead, passing to it as an argument the path that was initially used when attempting to run the script

Map Reduce: Why Do I Need to Specify "python" Before Piping to .py File?

1 Answers1