I need to create a script which will parse a text file containing list of md5 to hashes. My script works as it should for small files, but when speaking about list containing millions of lines I'm receiving IndexError: list index out of range
or MemoryError
. I've tried experimenting with dictionary but with no luck. For my reference I have used information from this post: How do you read a file into a list in Python? .
Sample file structure (file contains 10mln lines):
00003b63ee5e47514964167709ba60df:ainazulaikha
00004ae02a3cf46250ef834f7b75bb91:78836896hxy7
000066b871abdafac2052532ab9da827:nihao1314521+
0000721897d675d6ac0198ad19d48f21:y138636812709
00008f46c906349f1df99ccdea4104a1:sikaozhanche123
000093856b4e947511870f3e10464129:646434
00009ad044e03d0359e8065a0334a046:LiuYi20011105
0000a4bed6b4a1a6fa96a54ca906e1bd:chiaochiao0520
My script (for testing purposes):
with open('C:/Users/Admin/Downloads/106_17-media_found_hash_plain.txt', 'r') as f:
string = '00008f46c906349f1df99ccdea4104a1'
for line in f:
reg = re.findall("^'?([0-9A-Fa-f]{32})'?:'?([^\s]+)'?", line)
if string in reg[0][0]:
print('ok')