I would like to try out the Mincemeat map/reduce Python application for matrix multiplication. I am using Python 2.7. I found several web pages that describe how to do matrix multiplication using Hadoop in Java, and I have been referring to this one http://importantfish.com/one-step-matrix-multiplication-with-hadoop/ both because it is simple and because the pseudocode that it displays is very close to Python code already.
I noticed in the Java code that is also included that the matrix dimensions are supplied to the map and reduce functions via an additional argument of type Context. Mincemeat doesn't provide such a thing, but I got a suggestion that I could provide these values to my map and reduce functions using closures. The map and reduce functions I wrote look like this:
def make_map_fn(num_rows_result, num_cols_result):
m = num_rows_result
p = num_cols_result
def map_fn(key, value):
# value is ('A', i, j, a_ij) or ('B', j, k, b_jk)
if value[0] == 'A':
i = value[1]
j = value[2]
a_ij = value[3]
for k in xrange(1, p):
yield ((i, k), ('A', j, a_ij))
else:
j = value[1]
k = value[2]
b_jk = value[3]
for i in xrange(1, m):
yield ((i, k), ('B', j, b_jk))
return map_fn
def make_reduce_fn(inner_dim):
n = inner_dim
def reduce_fn(key, values):
# key is (i, k)
# values is a list of ('A', j, a_ij) and ('B', j, b_jk)
hash_A = {j: a_ij for (x, j, a_ij) in values if x == 'A'}
hash_B = {j: b_jk for (x, j, b_jk) in values if x == 'B'}
result = 0
for j in xrange(1, n):
result += hash_A[j] * hash_B[j]
return (key, result)
return reduce_fn
Then I assign them to Mincemeat like this:
s = mincemeat.Server()
s.mapfn = make_map_fn(num_rows_A, num_cols_B)
s.reducefn = make_reduce_fn(num_cols_A)
When I run this in Mincemeat, I get this error message:
error: uncaptured python exception, closing channel <__main__.Client connected at 0x2ada4d0>
(<type 'exceptions.TypeError'>:arg 5 (closure) must be tuple
[/usr/lib/python2.7/asyncore.py|read|83]
[/usr/lib/python2.7/asyncore.py|handle_read_event|444]
[/usr/lib/python2.7/asynchat.py|handle_read|140]
[/usr/local/lib/python2.7/dist-packages/mincemeat.py|found_terminator|96]
[/usr/local/lib/python2.7/dist-packages/mincemeat.py|process_command|194]
[/usr/local/lib/python2.7/dist-packages/mincemeat.py|set_mapfn|159])
I searched around on the net with search terms like |python closure must be tuple| and the things that I found seemed to be dealing with cases where someone is trying to construct a function using lambda or function() and need to make sure they didn't omit certain things when defining them as closures. In my case, the map_fn and reduce_fn values returned by make_map_fn and make_reduce_fn look like valid function objects, their func_closure values are tuples of cells containing the array dimensions that I want to supply, but something is still missing. What form do I need to pass these functions in to be usable by Mincemeat?