1

I have created the following code:

#!/usr/bin/env python
import mincemeat
import glob

all_files = glob.glob('textfiles/*.txt')

def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

# The data source can be any dictionary-like object
datasource = dict((file_name, file_contents(file_name))
                  for file_name in all_files)

def mapfn(k, v):
    for w in v.split():
        yield w, 1

def reducefn(k, vs):
    result = sum(vs)
    return result

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn

results = s.run_server(password="changeme")
print results

I am running this on my personal mac and running the client on the same machine. However, my question is if I run multiple clients on multiple machines, would the files be divided automatically? I mean will the mincemeat server assign the files to clients for processing? Also, in the example above I am not specifying a key in the mapper function. How can I specify a key e.g. a file name?

senshin
  • 10,022
  • 7
  • 46
  • 59

1 Answers1

1

Yes, mincemeat will automatically spread the work evenly across clients (this is one of the central aims of MapReduce).

In your map function, each call to yield yields a key and a value. In this example, the key is the word you're currently iterating over.

Michael Fairley
  • 12,980
  • 4
  • 26
  • 23
  • Thanks Michael for your answer. Regarding the key question, actually I wanted to know that how in the mapfn I can assign file name to the k parameter. In short how does mincemeat know that k is a file name and v is a line in the file. We have not specified it anywhere. Thanks. – Anupam Bansal Sep 29 '13 at 18:55