I'm writing a script in Python 3.5.3 that takes username/password combos from a file and writes them to another file. The script was written on a machine with Windows 10 and worked. However, when I tried to run the script on a MacBook running Yosemite, I got an error that has something to do with ASCII encoding.
The relevant function is this:
def buildDatabase():
print("Building database, this may take some time...")
passwords = open("10-million-combos.txt", "r") #File with user/pword combos.
hashWords = open("Hashed Combos.txt", "a") #File where user/SHA-256 encrypted pwords will be stored.
j = 0
hashTable = [[ None ] for x in range(60001)] #A hashtable with 30,000 elements, quadratic probing means size must = 2 x the final size + 1
for line in passwords:
toSearch = line
i = q = toSearch.find("\t") #The username/pword combos are formatted: username\tpassword\n.
n = toSearch.find("\n")
password = line[i:n-1] #i is the start of the password, n is the end of it
username = toSearch[ :q] + ":" #q is the end of the username
byteWord = password.encode('UTF-8')
sha.update(byteWord)
toWrite = sha.hexdigest() #password is encrypted to UTF-8, run thru SHA-256, and stored in toWrite
skip = False
if len(password) == 0: #if le(password) is 0, just skip it
skip = True
if len(password) == 1:
doModulo = ord(password[0]) ** 4
if len(password) == 2:
doModulo = ord(password[0]) * ord(password[0]) * ord(password[1]) * ord(password[1])
if len(password) == 3:
doModulo = ord(password[0]) * ord(password[0]) * ord(password[1]) * ord(password[2])
if len(password) > 3:
doModulo = ord(password[0]) * ord(password[1]) * ord(password[2]) * ord(password[3])
assignment = doModulo % 60001
#The if block above gives each combo an assignment number for a hash table, indexed by password because they're more unique than usernames
successful = False
collision = 0
The error is as follows:
Traceback (most recent call last):
File "/Users/connerboehm/Documents/Conner B/PythonFinalProject.py", line 104, in <module>
buildDatabase()
File "/Users/connerboehm/Documents/Conner B/PythonFinalProject.py", line 12, in buildDatabase
for line in passwords:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 2370: ordinal not in range(128)
What's happening here? I haven't gotten this error before on Windows, and I can't see any problem with my attempt to encode into UTF-8.
Edit: Notepad encodes in ANSI. Changing the encoding (just copying and pasting the data to a new .txt file) to UTF-8 solved the problem.