I wrote a mapper that prints out word pairs and a count of 1 for each of them.
import sys
from itertools import tee
for line in sys.stdin:
line = line.strip()
words = line.split()
def pairs(lst):
return zip(lst,lst[1:]+[lst[0]])
for i in pairs(words):
print i,1
I tried writing a reducer that creates a dictionary, but I am a bit stuck on how to sum them up.
import sys
mydict = dict()
for line in sys.stdin:
(word,cnt) = line.strip().split('\t') #\t
mydict[word] = mydict.get(word,0)+1
for word,cnt in mydict.items():
print word,cnt
But it says there are not enough arguments in the .split line, thoughts? Thank you.