Mincemeat map function returning dictionary

Question

I am using a map reduce implementation called mincemeat.py. It contains a map function and reduce function. First off I will tell what I am trying to accomplish. I am doing a coursera course on bigdata where there is a programming assignment. The question is that there are hundreds of files containing data of the form paperid:::author1::author2::author3:::papertitle

We have to go through all the files and give for a particular author, the word he has used to the maximum. So I wrote the following code for it.

import re

import glob
import mincemeat
from collections import Counter
text_files = glob.glob('test/*')

def file_contents(file_name):
    f = open(file_name)
    try:
        return f.read()
    finally:
        f.close()

datasource = dict((file_name, file_contents(file_name)) for file_name in text_files)

def mapfn(key, value):
    for line in value.splitlines():
        wordsinsentence = line.split(":::")
        authors = wordsinsentence[1].split("::")
        # print authors
        words = str(wordsinsentence[2])
        words = re.sub(r'([^\s\w-])+', '', words)
        # re.sub(r'[^a-zA-Z0-9: ]', '', words)
        words = words.split(" ")
        for author in authors:
            for word in words:
                word = word.replace("-"," ")
                word = word.lower()
                yield author, word

def reducefn(key, value):
    return Counter(value)

s = mincemeat.Server()
s.datasource = datasource
s.mapfn = mapfn
s.reducefn = reducefn
results = s.run_server(password="changeme")
# print results

i = open('outfile','w')
i.write(str(results))
i.close()

My problem now is that, reduce function has to receive authorname and all the words he has used in his titles, for all authors. So I expected an output like

{authorname: Counter({'word1':countofword1,'word2':countofword2,'word3':countofword3,..}).

But what I get is

authorname: (authorname, Counter({'word1': countofword1,'word2':countofword2}))

Can someone tell why it is happening like that? I don't need help to solve the question, I need help to know why it is happening like that!

Kindly remove the code, this violates the coursera code of honor. — vamosrafa, Sep 23 '13 at 07:14

score 1 · Answer 1 · answered May 22 '13 at 13:58

1

I ran your code and I see it is working as expected. The output looks like {authorname : Counter({'word1':countofword1,'word2':countofword2,'word3':countofword3,..}).

That said. Remove the code from here as it violates Coursera Code of Honor.

answered May 22 '13 at 13:58

Sundeep

1,536
5
23
35

score 0 · Answer 2 · edited Oct 05 '12 at 01:23

0

Check your value data structure in reducefn before Counter.

def reducefn(key, value):

    print(value)

    return Counter(value)

edited Oct 05 '12 at 01:23

andrewsi

10,807
132
35
51

answered Oct 05 '12 at 00:01

Kudo

1

Mincemeat map function returning dictionary

2 Answers2