Combine count of word pairs: python

Question

I wrote a mapper that prints out word pairs and a count of 1 for each of them.

import sys
from itertools import tee


for line in sys.stdin:
    line = line.strip()
    words = line.split()

def pairs(lst):
    return zip(lst,lst[1:]+[lst[0]])

for i in pairs(words):
    print i,1

I tried writing a reducer that creates a dictionary, but I am a bit stuck on how to sum them up.

import sys

mydict = dict()
for line in sys.stdin:
    (word,cnt) = line.strip().split('\t') #\t
    mydict[word] = mydict.get(word,0)+1

for word,cnt in mydict.items():
    print word,cnt

But it says there are not enough arguments in the .split line, thoughts? Thank you.

each time you are done iterating through `for line in sys.stdin:` , `words` ends up equaling the very last line, and the very last line alone. So what exactly does your `sys.stdin` look like? — TehTris, Oct 16 '14 at 22:05

score 1 · Accepted Answer · answered Oct 16 '14 at 22:07

I think the problem is (word,cnt) = line.strip().split('\t') #\t
The split() method returns a list, and tries to assign it to (word, cnt), which does not work because the number of items doesn't match (maybe there's sometimes only one word).
Maybe you want to use something like

for word in line.strip().split('\t'):
    mydict[word] = mydict.get(word, 0) + 1

If you have problems with empty list elements, use list(filter(None, list_name)) to remove them.

Disclaimer: I didn't test the code. Also, this only refers to your second example

Combine count of word pairs: python

1 Answers1

Linked