3

I would like to try out the Mincemeat map/reduce Python application for matrix multiplication. I am using Python 2.7. I found several web pages that describe how to do matrix multiplication using Hadoop in Java, and I have been referring to this one http://importantfish.com/one-step-matrix-multiplication-with-hadoop/ both because it is simple and because the pseudocode that it displays is very close to Python code already.

I noticed in the Java code that is also included that the matrix dimensions are supplied to the map and reduce functions via an additional argument of type Context. Mincemeat doesn't provide such a thing, but I got a suggestion that I could provide these values to my map and reduce functions using closures. The map and reduce functions I wrote look like this:

def make_map_fn(num_rows_result, num_cols_result):
    m = num_rows_result
    p = num_cols_result

    def map_fn(key, value):
        # value is ('A', i, j, a_ij) or ('B', j, k, b_jk)
        if value[0] == 'A':
            i = value[1]
            j = value[2]
            a_ij = value[3]

            for k in xrange(1, p):
                yield ((i, k), ('A', j, a_ij))
        else:
            j = value[1]
            k = value[2]
            b_jk = value[3]

            for i in xrange(1, m):
                yield ((i, k), ('B', j, b_jk))
    return map_fn


def make_reduce_fn(inner_dim):
    n = inner_dim

    def reduce_fn(key, values):
        # key is (i, k)
        # values is a list of ('A', j, a_ij) and ('B', j, b_jk)
        hash_A = {j: a_ij for (x, j, a_ij) in values if x == 'A'}
        hash_B = {j: b_jk for (x, j, b_jk) in values if x == 'B'}
        result = 0

        for j in xrange(1, n):
            result += hash_A[j] * hash_B[j]

        return (key, result)
    return reduce_fn

Then I assign them to Mincemeat like this:

s = mincemeat.Server()
s.mapfn = make_map_fn(num_rows_A, num_cols_B)
s.reducefn = make_reduce_fn(num_cols_A)

When I run this in Mincemeat, I get this error message:

error: uncaptured python exception, closing channel <__main__.Client connected at 0x2ada4d0>
(<type 'exceptions.TypeError'>:arg 5 (closure) must be tuple
 [/usr/lib/python2.7/asyncore.py|read|83]
 [/usr/lib/python2.7/asyncore.py|handle_read_event|444]
 [/usr/lib/python2.7/asynchat.py|handle_read|140]
 [/usr/local/lib/python2.7/dist-packages/mincemeat.py|found_terminator|96]
 [/usr/local/lib/python2.7/dist-packages/mincemeat.py|process_command|194]
 [/usr/local/lib/python2.7/dist-packages/mincemeat.py|set_mapfn|159])

I searched around on the net with search terms like |python closure must be tuple| and the things that I found seemed to be dealing with cases where someone is trying to construct a function using lambda or function() and need to make sure they didn't omit certain things when defining them as closures. In my case, the map_fn and reduce_fn values returned by make_map_fn and make_reduce_fn look like valid function objects, their func_closure values are tuples of cells containing the array dimensions that I want to supply, but something is still missing. What form do I need to pass these functions in to be usable by Mincemeat?

senshin
  • 10,022
  • 7
  • 46
  • 59
SuperSloMo
  • 31
  • 1

1 Answers1

0

I hate to be the bearer of bad news, but this is just the result of a few off-by-one errors in your code, plus two errors in the input file provided by the site you linked. It is unrelated to your usage of a closure, misleading error messages notwithstanding.

Off-by-one errors

Notice that the innermost loops in the pseudocode look like this:

for k = 1 to p:
for i = 1 to m:
for j = 1 to n:

In pseudocode, this typically indicates that the endpoint is included, i.e. for k = 1 to p means k = 1, 2, ..., p-1, p. On the other hand, the corresponding loops in your code look like this:

for k in xrange(1, p):
for i in xrange(1, m):
for j in xrange(1, n):

And of course, xrange(1, p) yields 1, 2, ..., p-2, p-1. Assuming you indexed the matrices from 0 (as they did on the site you linked), all your xranges should start at 0 (e.g. xrange(0, p)), as their equivalents in the Java code do (for (int k = 0; k < p; k++)). This fixes one of your problems.

Input file errors

In case you didn't catch this, the input file for A and B that the site provides is incorrect - they forgot the (0,0) entries of both matrices. In particular, you should add a line to the beginning of the form A,0,0,0.0, and a line between 9 and 10 of the form B,0,0,0.0. (I guess where exactly you put it doesn't matter, but for consistency, you may as well put them where they naturally fit.)


Once I correct these two errors, mincemeat gives me the result we expect (formatted):

{(0, 1): ((0, 1), 100.0), 
 (1, 2): ((1, 2), 310.0), 
 (0, 0): ((0, 0),  90.0), 
 (0, 2): ((0, 2), 110.0), 
 (1, 0): ((1, 0), 240.0), 
 (1, 1): ((1, 1), 275.0)}

I haven't figured out exactly what's going on with the error message, but I think it boils down to the fact that the incorrect loop indices in the map function are resulting in garbage data being passed to the reduce nodes, which is why the error mentions the reduce function.

Basically, what happens is that hash_A and hash_B in the reduce function sometimes don't have the same keys, so when you try to multiply hash_A[j] * hash_B[j], you'll get a KeyError because j is not a key of one or the other, and this gets caught somewhere upstream and rethrown as a TypeError instead.

Community
  • 1
  • 1
senshin
  • 10,022
  • 7
  • 46
  • 59