I am trying to find the frequency of the words in a .txt file and enrich it by sorting the number of occurrences of each word.
So far, I completed %90 of the task. What is left is to sort the number of occurrences in descending order.
Here is my code:
def frequency_check(lines):
print("Frequency of words in file")
words = re.findall(r"\w+", lines)
item_list = []
for item in words:
if item not in item_list:
item_count = words.count(item)
print("{} : {} times".format(item, item_count))
item_list.append(item)
with open("original-3.txt", 'r') as file1:
lines = file1.read().lower()
frequency_check(lines)
This is the .txt file on which I am finding the word frequency,
Here's the output I get:
Frequency of words in file
return : 2 times
all : 1 times
non : 1 times
overlapping : 1 times
matches : 3 times
of : 5 times
pattern : 3 times
in : 4 times
string : 2 times
as : 1 times
a : 3 times
list : 3 times
strings : 1 times
the : 6 times
is : 1 times
scanned : 1 times
left : 1 times
to : 1 times
right : 1 times
and : 1 times
are : 3 times
returned : 1 times
order : 1 times
found : 1 times
if : 2 times
one : 2 times
or : 1 times
more : 2 times
groups : 2 times
present : 1 times
this : 1 times
will : 1 times
be : 1 times
tuples : 1 times
has : 1 times
than : 1 times
group : 1 times
empty : 1 times
included : 1 times
result : 1 times
unless : 1 times
they : 1 times
touch : 1 times
beginning : 1 times
another : 1 times
match : 1 times
Process finished with exit code 0
It would be a great challenge to sort these and output from highest number of occurrences to lowest.
PS:I thought about using dictionaries, however, dictionaries are immutable and I can't use sort method on them
Any ideas?
Thank you very much