1

I am learning/testing mrjobs on my laptop, using the wordcount example.

I am able to provide a local file as input in command mode but don't know how to do the same thing from within the python script.

Greatly appreciate a simple example.

Thanks Ananth

akrishnamo
  • 449
  • 1
  • 3
  • 15
  • Which python script do you mean? You pretty much always start an mrjob , and specify the input, from the command line. – jkgeyti Jun 23 '13 at 19:30
  • I think you want this: http://stackoverflow.com/questions/12569261/how-does-one-specify-the-input-file-for-a-runner-from-python – Frank Aug 27 '13 at 00:39
  • @jkgeyti But what if we want to give that input from within the program and not from the command line ? – Saurabh Verma Feb 09 '15 at 12:53
  • Been a while since I've worked with mrjob. I'd take a closer look at the mrjob source file, to see how it submits jobs. Alternatively, you can just start a subprocess from within python, and submit it as you'd do from the command line. – jkgeyti Feb 09 '15 at 13:46

1 Answers1

0

didn't quite understand what you're asking, but i guess you're looking for something like this

[root@localhost code]# cat mr_example.py 

from mrjob.job import MRJob

class MRWordFrequencyCount(MRJob):

    def mapper(self, _, line):
        yield "chars", len(line)
        yield "words", len(line.split())
        yield "lines", 1

    def reducer(self, key, values):
        yield key, sum(values)

if __name__ == '__main__':
    MRWordFrequencyCount.run()

[root@localhost code]# cat test_file 
aaaa
dd dx csadsad
2321 dasdtokcmk
mii xsa
xaaaa
casd

[root@localhost code]# python mr_example.py test_file
...
"chars" 50
"lines" 6
"words" 10
zhutoulala
  • 4,792
  • 2
  • 21
  • 34