I have a mapper.py script and i have a reducer.py script using the sort pipeline.
mapper.py
#!/usr/bin/env python
import sys
import re
import fileinput
# Read pairs as lines of input from STDIN
for line in fileinput.input():
filename = fileinput.filename()
filename = filename.replace("source_text/", "")
print filename
sorter/reducer
#!/usr/bin/env python
import sys
for line in sys.stdin:
print line
i then run this in my console
cat source_text/* | ./mapper.py | sort | ./reducer.py
the problem is that while the filename in the mapper correctly shows the filename when it gets passed to the reducer script, it gets replaced with <stdin>
my question is how can i pass the real filename to the second script?