9

im new to python and programming, and its not easy for me to get that stuff in my mind. because the books i started to read are completely boring, i startet to play around with some ideas.

here is what i want to do: open the textfile, count the frequency of every single value (just a list of systemnames), sort the list by frequency, and return the result. after searching the web for some code to do it, i got this here:

file = open('C:\\Temp\\Test2.txt', 'r')
text = file.read()
file.close()


word_list = text.lower().split(None)

word_freq = {}

for word in word_list:

    word_freq[word] = word_freq.get(word, 0) + 1
list = sorted(word_freq.keys())
for word in list:
    print ("%-10s %d" % (word, word_freq[word]))

It works, but it sorts by the words / systemnames in the list:

pc05010    3
pc05012    1
pc05013    8
pc05014    2

I want it like that:

pc05013    8
pc05010    3
pc05014    2
pc05012    1

now im searching for the sort-by-value function for hours. i bet its so easy, but i found nothing.

for my beginners point of view, it has something to do with this line:

list = sorted(word_freq.keys())

i thought maybe its:

list = sorted(word_freq.values())

but no.... its very frustrating to me to see all the tons of information about this language, but could not get such simple things to work.

please help :)

thanks a lot!

Fabster
  • 133
  • 1
  • 5
  • 2
    You shouldn't use `list` as a variable name because it is the name of the built-in `list()` function. Doing so is called [shadowing builtins](http://stackoverflow.com/questions/11263502/consequences-of-shadowing-built-in-types-functions). – Burhan Khalid May 25 '13 at 12:50

3 Answers3

5

You've to use word_freq.items() here:

lis = sorted(word_freq.items(), key = lambda x:x[1], reverse = True)
for word,freq in lis:
    print ("%-10s %d" % (word, freq))

Don't use list as a variable name.

Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
4

Take a look at collections.Counter

>>> wordlist = ['foo', 'bar', 'foo', 'baz']
>>> import collections
>>> counter = collections.Counter(wordlist)
>>> counter.most_common()
[('foo', 2), ('baz', 1), ('bar', 1)]
Blubber
  • 2,214
  • 1
  • 17
  • 26
4

Use a collections.Counter to help with counting things, and with statement to help with opening (and closing) files.

import collections

with open('C:\\Temp\\Test2.txt', 'r') as f:
    text = f.read()

word_freq = collections.Counter(text.lower().split())
for word, freq in word_freq.most_common():
    print ("%-10s %d" % (word, freq))
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677