0

I'm getting this error in my python code:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 5884: invalid continuation byte

The script is for a dictionary attack using the Crackstation dictionary. I'm trying to make this for fun, but there's a problem when I try to iterate through the items in the dictionary.

pass_file = open(pass_doc, 'r')

for word in pass_file:

pass_doc is a .txt file, NOT .csv. Does it have to be .csv?

I've tried using load_text() instead of open(), but all I want is a simple list of items. What should happen is the code runs through all the items in the dictionary, stored in a list, and I don't know really what's wrong.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
jedd
  • 9
  • 2
  • 2
    You need to pass an `encoding=` parameter to the `open()` call that matches the actual encoding of the file. (You haven't supplied enough information for us to tell what that encoding might be.) – jasonharper Apr 17 '23 at 16:58
  • 1
    `utf-8` is the default encoding method assumed by `open`; your file is *not* UTF-8-encoded. – chepner Apr 17 '23 at 17:09
  • Are there any characters other than normal U.S. Keyboard chars? – Garlic Bread Express Jun 27 '23 at 20:07
  • Thanks everyone i just came back to the project a simply saved a UTF-8 version. When i first tried this i had no idea that it wasnt already! – jedd Aug 29 '23 at 18:52

1 Answers1

-2

Make your text file encoded as utf-8 when saving it. If you want to keep the current encoding, try this:

import codecs
BLOCKSIZE = 1048576 # or some other, desired size in bytes
with codecs.open(sourceFileName, "r", "your-source-encoding") as sourceFile:
    with codecs.open(targetFileName, "w", "utf-8") as targetFile:
        while True:
            contents = sourceFile.read(BLOCKSIZE)
            if not contents:
                break
            targetFile.write(contents)

This question might also help: UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c