finding the 10 most frequent letters in a file

Question

so I'm pretty much a beginner and I have to go through this exercise whom task is to print the top 10 words in a given file. Now, this is how my code looks so far:

file = input("Enter a file ")
sfile = open(file)
alphabet = ["a","b","c","d","e","f","g","h","i","j","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
w_ords= dict()
for line in sfile :
   line=line.strip()
   words = line.split()
   for letter in words:
       letter = [words]
       for letter in alphabet :
           w_ords[letter] = w_ords.get(letter, 0)+ 1
print(sorted([(b,a[:10]) for a,b in w_ords.items()], reverse = True))

the problem is that my output looks like this

[(4910, 'z'), (4910, 'y'), (4910, 'x'), (4910, 'w'), (4910, 'v'), (4910, 'u'), (4910, 't'), (4910, 's'), (4910, 'r'), (4910, 'q'), (4910, 'p'), (4910, 'o'), (4910, 'n'), (4910, 'm'), (4910, 'l'), (4910, 'j'), (4910, 'i'), (4910, 'h'), (4910, 'g'), (4910, 'f'), (4910, 'e'), (4910, 'd'), (4910, 'c'), (4910, 'b'), (4910, 'a')]

what am I doing wrong? the dividing into letters kinda works the problem is that it doesn't count them as unique letters I guess.

I can't understand what `letter = [words]` is intended to accomplish, and `for letter in alphabet :` will use the **same `letter` variable** as the outer loop. — Karl Knechtel, Sep 16 '22 at 17:23

JNevill · Answer 1 · 2022-09-16T16:56:56.737

You don't need alphabet so just drop that from the logic completely. If you want to retain it, then just set up w_ords from the start with each key w_ords = {"a":0,"b":0,"c":0, etc...}

From there you start out well:

sfile = open(file)
w_ords= dict()
for line in sfile :
   line=line.strip()
   words = line.split()

But then you go off the rails with for letter in words:. words is a list of words, so an individual item would be a word. Furthermore in the very next line after that for loop you set letter to [words] completely overwriting the value of letter that is being set by the loop you just created. THEN you overwrite it once again by implementing another loop for letter in alphabet. It's important to understand that for var in <iterable> sets the value of var inside that loop. Each iteration it is updated to the element of the iterable you are iterating.

Instead:

sfile = open(file)
w_ords= dict()
for line in sfile :
   line=line.strip()
   words = line.split()
   for word in words:

Now you still need to loop through each letter, so another for loop:

sfile = open(file)
w_ords= dict()
for line in sfile :
   line=line.strip()
   words = line.split()
   for word in words:
       for letter in word:

This is it for the loop. You are now getting each single letter, one by one, from the file by loop each line, each word in the line, and each letter in the word. Now it's time to construct your dictionary:

sfile = open(file)
w_ords= dict()
for line in sfile :
   line=line.strip()
   words = line.split()
   for word in words:
       for letter in word:
           w_ords[letter] = w_ords.get(letter, 0)+ 1

By looping through the alphabet list you were multiplying each count by 26.

That gets you to a close-to-correct state. Because this is case sensitive you will likely want to do:

  w_ords[letter.lower()] = w_ords.get(letter.lower(), 0)+ 1

thank you! in the end i figured it out using .findall expression but this was nonetheless very useful to understand what I was doing wrong:) — mvrmoris, Sep 16 '22 at 21:23
Glad it was helpful. Definitely a lot of ways to skin this cat. — JNevill, Sep 19 '22 at 13:41

score 0 · Answer 2 · answered Sep 16 '22 at 17:12

I can suggest a little simpler solution:

from collections import defaultdict

letter_counts = defaultdict(int)
with open("my-file.txt", "r") as _file:
    text = _file.read()

for char in text:
    if char.isalpha():
        letter_count[char.lower()] += 1

# this will give you {"a": 4, "b": 1, ...} so you need to sort it:

letters = sorted(sss.items(), key=lambda x: x[1], reverse=True)

print(letters[:10]

# [("a", 4), ("b": 1"), ...]

score 0 · Answer 3 · answered Sep 16 '22 at 17:20

Adding to @JNevill's great answer, sometimes it's much simpler to use python's built in functionality. For this problem I'd recommend using Counter from python's collections module.

You'd use it like this:

from collections import Counter

with open("your/file/path") as f:
    file_contents = f.read()
    most_common = Counter(file_contents).most_common(10)

It returns a list of tuples, such as:

[('a', 12), ('d', 12), ('s', 10), ('h', 10), ('g', 8), ('j', 6), ('k', 3), ('f', 3)]

got it thank you I'll try this out but for the task I needed to use a dictionary so that's why — mvrmoris, Sep 16 '22 at 21:25

finding the 10 most frequent letters in a file

3 Answers3