1

I have a mapper.py script and i have a reducer.py script using the sort pipeline.

mapper.py

#!/usr/bin/env python

import sys
import re
import fileinput

# Read pairs as lines of input from STDIN
for line in fileinput.input():
   filename = fileinput.filename()
   filename = filename.replace("source_text/", "")

   print filename

sorter/reducer

#!/usr/bin/env python
import sys

for line in sys.stdin:

    print line

i then run this in my console

cat source_text/* | ./mapper.py | sort | ./reducer.py

the problem is that while the filename in the mapper correctly shows the filename when it gets passed to the reducer script, it gets replaced with <stdin>

my question is how can i pass the real filename to the second script?

Petros Kyriakou
  • 5,214
  • 4
  • 43
  • 82
  • 2
    why cant you just write a single python script to do all the operations. – Radan Mar 24 '16 at 16:41
  • because i want to use it for Elastic MapReduce (hadoop) in amazon and it needs a mapper and reducer script. I have the whole script working in single file. – Petros Kyriakou Mar 24 '16 at 16:42
  • 1
    The script isn't reading from whatever source file you think it is; it's reading from standard input. If you want to see file names, you'll have to change how your script takes input. – user2357112 Mar 24 '16 at 16:42
  • @user2357112 what you say is partly true but i do not know of any other way to call both scripts like this? if it was only one script it would go like this, `source_text/* ./mapper.py`, but for both scripts it does not seem to work – Petros Kyriakou Mar 24 '16 at 16:43
  • (cat source_text/* | ./mapper.py | sort > tempfile.txt | ./reducer tempfile.txt) can you write the output into a temp file after sort then read it in the reducer script? – Radan Mar 24 '16 at 16:47
  • @Radan the thing is that i cant run `cat source_text/* | ./mapper.py` because it won't read the filename so i am forced to do it this way, `source_text/* ./mapper.py`, you can read my other topic here `http://stackoverflow.com/questions/36182156/how-to-get-filename-from-stdin` thats why i have trouble figuring out how to write a second script after the execution of the first one. – Petros Kyriakou Mar 24 '16 at 16:51

0 Answers0