1

I wrote a mapper that prints out word pairs and a count of 1 for each of them.

import sys
from itertools import tee


for line in sys.stdin:
    line = line.strip()
    words = line.split()

def pairs(lst):
    return zip(lst,lst[1:]+[lst[0]])

for i in pairs(words):
    print i,1

I tried writing a reducer that creates a dictionary, but I am a bit stuck on how to sum them up.

import sys

mydict = dict()
for line in sys.stdin:
    (word,cnt) = line.strip().split('\t') #\t
    mydict[word] = mydict.get(word,0)+1

for word,cnt in mydict.items():
    print word,cnt

But it says there are not enough arguments in the .split line, thoughts? Thank you.

user3295674
  • 893
  • 5
  • 19
  • 42

1 Answers1

1

I think the problem is (word,cnt) = line.strip().split('\t') #\t
The split() method returns a list, and tries to assign it to (word, cnt), which does not work because the number of items doesn't match (maybe there's sometimes only one word).
Maybe you want to use something like

for word in line.strip().split('\t'):
    mydict[word] = mydict.get(word, 0) + 1

If you have problems with empty list elements, use list(filter(None, list_name)) to remove them.

Disclaimer: I didn't test the code. Also, this only refers to your second example

greschd
  • 606
  • 8
  • 19