0

I am pretty new to Python and to help new learn, I am building a program, which I want broken down into 2 steps:

Step 1) Count the number of a particular words in a text file, store that in a dictionary where the key, value pairs are {word, count}

Step 2) Order the dictionary from (1) in descending order, to show the top 100 words

Step 1 works fine but in attempting step 2, I am struggling to call the dictionary from the first function. I create a new variable 'tallies' but this is a tuple and shows only the first entry in the dictionary.

How do I call the full dictionary to my 2nd function?

Thanks.

filename = 'nameoffile.txt'

def tally():
  file = open(filename,'r')
  wordcount={}
  for word in file.read().split():
    if word not in wordcount:
      wordcount[word] = 1
    else:
      wordcount[word] += 1
  for k,v in wordcount.items():
    return k,v

def Count():
  tallies = tally()
  print tallies

Count()
  • Since you are new to python Google before you write code. There may be a solution out there already. Python is an idiomatic language. Ppl will use the same piece of code Look at answer that uses 'Counter'. – Merlin Aug 27 '16 at 21:23

3 Answers3

0

These tasks are exactly what collections.Counter() is for. You can use this function in order to create a counter-dictionary object contains words and their frequency, you can call it on splited text. Then use Counter.most_common(N) to get most N common items.

And regarding your code in following part:

for k,v in wordcount.items():
    return k,v

After first iteration you are breaking the loop by return and it only will return the first item.

You can simply return the dictionary:

def tally():
    file = open(filename,'r')
    wordcount={}
    for word in file.read().split():
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1
    return wordcount

You even could use collections.defaultdict() in order to create your counter object manually. The benefit of using this function is that it overrides one method and adds one writable instance variable.

from collections import defaultdict

wordcount = defaultdict(int) # default is 0

def tally():
    with open(filename) as f 
    for word in f.read().split():
            wordcount[word] += 1
    return wordcount

And for returning the sorted items you can use sorted() function on dictionary items by passing a key function to it, to say that sort the items by second item. For example:

sorted(wordcount.items(), key=lambda x:x[1])

But as I said the the first, the pythonic and optimized approach is using collections. Counter().

from collections import Counter

with open(filename) as f:
    wordcount = Counter(f.read().split())

top100 = wordcount.most_common(100)
Mazdak
  • 105,000
  • 18
  • 159
  • 188
0

your tally function is returning the first item it sees; return can only return once, but you're calling it in a loop. try returning the whole wordcount dict:

filename = 'nameoffile.txt'

def tally():
  file = open(filename,'r')
  wordcount={}
  for word in file.read().split():
    if word not in wordcount:
      wordcount[word] = 1
    else:
      wordcount[word] += 1
  return wordcount

def Count():
  tallies = tally()
  sorted_tallies = sorted(tallies.items(), key=operator.itemgetter(1))
  print sorted_tallies[:100]

Count()

in python a dict is by nature unordered, so to order it you need to sort its tuples into a list. the sorted code does this (see this reference).

good luck!

Community
  • 1
  • 1
derelict
  • 2,044
  • 10
  • 15
0

Your issue is that you returned k,v after the first iteration meaning you only ever grabbed the first item. The following code fixes this. I also added the reversal function.

def tally():
    file = open(filename,'r')
    wordcount={}
    for word in file.read().split():
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1

    return tuple(reversed(sorted(((k, v) for k, v in wordcount.items()),key=lambda x: x[1])))

def Count():
    tallies = tally()
    print tallies
TheLazyScripter
  • 2,541
  • 1
  • 10
  • 19