-1

This is the code am working on and I want the output as the count in descending and if the count is same then order by name.

from collections import Counter
import re
from nltk.corpus import stopwords
import operator
text = "The quick brown fox jumped over the lazy dogs bowl. The dog was angry with the fox considering him lazy."
def tokenize(text):
    tokens = re.findall(r"\w+|\S", text.lower())
    #print(tokens)
    tokens1 = []
    for i in tokens:
        x = re.findall(r"\w+|\S", i, re.ASCII)
        for j in x:
            tokens1.append(j)

    return tokens
tok = tokenize(text)

punctuations = ['(',')',';',':','[',']',',', '...', '.', '&']

keywords = [word for word in tok if not word in punctuations]

cnt = Counter()
d= {}
for word in keywords:
    cnt[word] += 1 
print(cnt)
freq = operator.itemgetter(1)

for k, v in sorted(cnt.items(), reverse=True, key=freq):
    print("%3d  %s" % (v, k))

Current output:

  4  the
  2  fox
  2  lazy
  1  quick
  1  brown
  1  jumped
  1  over
  1  dogs
  1  bowl
  1  dog
  1  was
  1  angry
  1  with
  1  considering
  1  him

Required output:

  4  the
  2  fox
  2  lazy
  1  angry
  1  bowl
  1  brown
  1  considering
  1  dog
  1  dogs

etc.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
monty
  • 13
  • 5
  • 1
    Like the underlying dictionary, `Counter` is **not** an ordered data structure. If order matters, see e.g. https://stackoverflow.com/questions/35446015/creating-an-ordered-counter. Or consider sorting by `text.index` of the key along with the value. – jonrsharpe Feb 21 '18 at 21:51

1 Answers1

0

Use a sorting function that returns a tuple. The first item in the tuple is the inverse of the count (the value in your dictionary) and the second is the string (the key in your dictionary). You can do this by removing the variable freq, removing the keyword reverse in the call to sorted, and supplying a little lambda function that returns (-value, key) for each item in the dictionary. The last few lines of the program are:

print(cnt)
for k, v in sorted(cnt.items(), key=lambda item: (-item[1], item[0])):
    print("%3d  %s" % (v, k))

The reason for the - sign in the lambda function is to get the correct sort order, since the default sort order is lowest to highest.

Paul Cornelius
  • 9,245
  • 1
  • 15
  • 24