1

I'm writing a script in Python 3.5.3 that takes username/password combos from a file and writes them to another file. The script was written on a machine with Windows 10 and worked. However, when I tried to run the script on a MacBook running Yosemite, I got an error that has something to do with ASCII encoding.

The relevant function is this:

def buildDatabase():
        print("Building database, this may take some time...")
        passwords = open("10-million-combos.txt", "r") #File with user/pword combos.
        hashWords = open("Hashed Combos.txt", "a") #File where user/SHA-256 encrypted pwords will be stored.
        j = 0
        hashTable = [[ None ] for x in range(60001)] #A hashtable with 30,000 elements, quadratic probing means size must = 2 x the final size + 1
        for line in passwords: 
                toSearch = line 
                i = q = toSearch.find("\t") #The username/pword combos are formatted: username\tpassword\n.
                n = toSearch.find("\n")
                password = line[i:n-1] #i is the start of the password, n is the end of it
                username = toSearch[ :q] + ":" #q is the end of the username
                byteWord = password.encode('UTF-8')
                sha.update(byteWord)
                toWrite = sha.hexdigest() #password is encrypted to UTF-8, run thru SHA-256, and stored in toWrite
                skip = False
                if len(password) == 0: #if le(password) is 0, just skip it
                        skip = True
                if len(password) == 1:
                        doModulo = ord(password[0]) ** 4
                if len(password) == 2:
                        doModulo = ord(password[0]) * ord(password[0]) * ord(password[1]) * ord(password[1])
                if len(password) == 3:
                        doModulo = ord(password[0]) * ord(password[0]) * ord(password[1]) * ord(password[2])
                if len(password) > 3:
                        doModulo = ord(password[0]) * ord(password[1]) * ord(password[2]) * ord(password[3])
                assignment = doModulo % 60001
                #The if block above gives each combo an assignment number for a hash table, indexed by password because they're more unique than usernames
                successful = False
                collision = 0

The error is as follows:

Traceback (most recent call last):
  File "/Users/connerboehm/Documents/Conner B/PythonFinalProject.py", line 104, in <module>
    buildDatabase()
  File "/Users/connerboehm/Documents/Conner B/PythonFinalProject.py", line 12, in buildDatabase
    for line in passwords:
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 2370: ordinal not in range(128)

What's happening here? I haven't gotten this error before on Windows, and I can't see any problem with my attempt to encode into UTF-8.

Edit: Notepad encodes in ANSI. Changing the encoding (just copying and pasting the data to a new .txt file) to UTF-8 solved the problem.

  • Time to study character encodings such as ASCII and unicode, UTF-8 is a good place to start. – zaph Jul 29 '17 at 20:48

1 Answers1

2

Your program doesn't say what codec is used in the file "10-million-combos.txt", so Python is in this case trying to decode it as ASCII. 0xaa isn't an ASCII ordinal so that failed. Identify what codec is used in your file and pass that in the encoding parameter for open.

Yann Vernier
  • 15,414
  • 2
  • 28
  • 26
  • "Identify what codec is used in your file" is easier said than done. Maybe you could suggest a way to do it, like the [chardet](https://pypi.python.org/pypi/chardet) module ? – Jean-François Fabre Jul 29 '17 at 21:10
  • the chardet module suggests a utf-16 in some cases https://stackoverflow.com/questions/55563399/how-to-solve-unicodedecodeerror-utf-8-codec-cant-decode-byte-0xff-in-positio – TRicks43 Aug 28 '22 at 20:55