-1

Currently I am trying to get words such as "The quick brown fox jumped over the lazy brown dog’s back" read from a text file and organized by word length and by word count.

So the output should be:

1 The

1 fox

1 the

1 back

1 lazy

1 over

2 brown

1 dog’s

1 quick

1 jumped

I did check quite a lot of stackoverflow questions like how to sort by length of string followed by alphabetical order?, and I'm going to guess I missed it, or I don't understand how to use it. I'm a beginner with python.

This is what I have so far:

 from collections import Counter
 file = open("text.txt","r")

 #read the file & split words 
 wordcount =Counter(file.read().split())

 #printing word count 
 for item in wordcount.items():

     print ("{}\t{}".format(*item))

Could someone help me know what i'm doing wrong?

Community
  • 1
  • 1

4 Answers4

1

Try something like-

from collections import Counter
import re
#Identify each word using regex
words = re.findall(r'\w+', open(r"D:\test.txt").read())
#Find counts
data= Counter(words).most_common()
data = sorted(data,key=lambda x:x[0])
print data

Prints-

[('The', 1), ('back', 1), ('brown', 2), ('dog', 1), ('fox', 1), ('jumped', 1), ('lazy', 1), ('over', 1), ('quick', 1), ('s', 1), ('the', 1)]

Or try word by split-

from collections import Counter
import re
words=open(r"D:\test.txt").read().split(" ")
data= Counter(words).most_common()
data = sorted(data,key=lambda x:x[1])
print data

Prints-

[('lazy', 1), ('jumped', 1), ('over', 1), ('fox', 1), ('back', 1), ('quick', 1), ('The', 1), ('the', 1), ('dog's', 1), ('brown', 2)]
Learner
  • 5,192
  • 1
  • 24
  • 36
  • @Slslam how would I order the words from shortest to longest word though? The code I have so far counts how many words I have. – techiegeek Nov 18 '15 at 17:57
  • `data = sorted(data,key=lambda x:x[1]) print data` if you use `x[0]` then it will sort based on word and if `x[1]` it will be based on count – Learner Nov 18 '15 at 18:00
  • I checked the output and it turns into: '[('lazy', 1), ('jumped', 1), ('over', 1), ('fox', 1), ('dog', 1), ('back', 1), ('s', 1), ('quick', 1), ('The', 1), ('the', 1), ('brown', 2)] ' I'm alittle confused how does this sort the lengths in the sentence? – techiegeek Nov 18 '15 at 18:05
1

as my comment says, you can't sort a dict because dicts are not ordered (it has to do with how the key/value pairs are hashed to allow for O(1) value getting).

You can instead iterate through a sorted dict.items() since .items() returns a list of tuples and lists ARE ordered.

>>> s = "The quick brown fox jumped over the lazy brown dog’s back"

>>> from collections import Counter
>>> wordcount = Counter(s.split())
>>> wordcount
Counter({'brown': 2, 'back': 1, 'quick': 1, 'The': 1, 'over': 1, 'dog’s': 1, 'jumped': 1, 'fox': 1, 'the': 1, 'lazy': 1})
>>> for key,val in sorted(wordcount.items(),key = lambda pair: len(pair[0])):
    print(str(val),key)


1 The
1 fox
1 the
1 back
1 over
1 lazy
1 quick
2 brown
1 dog’s
1 jumped

using the builtin sorted(list,key=somefunction) function, you can sort the list that is returned by wordcount.items() by the length of the key (which is accessed by pair[0] since pair == (key,value)

R Nar
  • 5,465
  • 1
  • 16
  • 32
  • Thank you for your help! How did "The" become first, and how were you able to print out "dog's" without an ASCII code error? – techiegeek Nov 18 '15 at 18:09
  • the sorted function will only sort by length, if there are multiple words with the same word length, whatever is encountered is first is probably the first one. ASCII code error might be a problem with your reading, mine is just initialized from a string – R Nar Nov 18 '15 at 18:12
1

First, the dictionary has to be converted to a list of tuples, then sort it and print/return:

#shaffled words dict
words = {"The": 1,
"fox": 1,
"dog's": 1,
"quick": 1,
"jumped": 1,
"over": 1,         
"the": 1,
"brown": 1,
"back": 1,
"lazy": 1}

#convert dict to list of tuples
def toList(d1):
    l1 = []
    for k in d1:
        l1.append((k, d1[k]))
    return l1

#sort the list by length and alfabet
output = sorted(toList(words), key = lambda w: (len(w[0]), w[0]))

    for o in output:
        print str(o[1]) + " " + str(o[0])
"""
expected output is:
1 The
1 fox
1 the
1 back
1 lazy
1 over
1 brown
1 dog's
1 quick
1 jumped
"""
Vadim
  • 633
  • 1
  • 8
  • 17
0

[EDIT] I reread the post and figured out that was not exactly what you wanted. See answers of others.

A dictionary is something similar to a list, but instead of integers as indecies you use strings. They are useful if you want to store data with key-value structure like "Mom":39, "Kevin":12, "Sally":14. Dictionarys are not sortable.

For what you need, a simple list of strings will do. (You can sort it afterwords by just calling sort() on the list:

words = file.read().split() #that is a list
words.sort()
xXliolauXx
  • 1,273
  • 1
  • 11
  • 25