0

I have to be able to read a text file and count the number of times the words in a line occur. Plus I have to be able to sort the words from most to least occurring. My code so far is below and I keep getting this error:

Traceback (most recent call last):
File "/Users/lritter/Documents/wordcount.py", line 9, in <module>
     lines = file.readlines()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 7927: invalid start byte

Code:

import os
count = {}
os.chdir('/Users/lritter/Desktop/Python')

item = int(input('Which line would you like to evaluate? '))
print('You entered: ', item)

with open('Obama_speech.txt') as file:
    lines = file.readlines()
    message = (lines[(item)])
    message2 = message.split
for word in message2():
    if len(word) >= 5:
        count[word] = count.get(word,0)+1

print(count)
macropod
  • 12,757
  • 2
  • 9
  • 21
  • You have an encoding problem. Python assumes by default your file must be encoded in utf-8, but it's not. – Ignatius Reilly Sep 26 '22 at 14:51
  • Maybe [this](https://stackoverflow.com/questions/2144815/how-to-know-the-encoding-of-a-file-in-python) can help, and it's [linked duplicate](https://stackoverflow.com/questions/436220/how-to-determine-the-encoding-of-text) – Ignatius Reilly Sep 26 '22 at 14:53
  • Try using "r" in your open statement: `with open('Obama_speech.txt', "r") as file:` – Juan Federico Sep 26 '22 at 15:04

0 Answers0