I'm trying to learn a little bit of mapreduce in combination with Python.
Now I have the following code running from a tutorial I'm doing.
from mrjob.job import MRJob
class SpendByCustomer(MRJob):
def mapper(self, _, line):
(customerID, itemID, orderAmount) = line.split(',')
yield customerID, float(orderAmount)
def reducer(self, customerID, orders):
yield customerID, sum(orders)
if __name__ == '__main__':
SpendByCustomer.run()
It should do the following.
When I hit !python SpendByCustomers.py customer-orders.xls > test.txt
it should read in a .xls
file, map and reduce it and write the output to test.txt
.
All works fine and I mostly understand it. However I would really like to get some more insights about the following:
In
def mapper(self, _, line):
What is the
_
doing here?In
if __name__ == '__main__': SpendByCustomer.run()
What is this function exactly doing?