I wrote a method to generate hashes and return them in a list of dictionaries. It works well with small amount of records, for example, 100. But it requires around 17 minutes to generate hashes for 10000 records.
How can the following code be improved to process 10000 records faster (couple of minutes)? Maybe multithreading will help me?
def generate_hashes(self, records):
def get_year(date):
return str(date.year)
def create_hash(string):
md5 = hashlib.md5()
md5.update(string)
return md5.hexdigest()
result = []
for rec in records:
rec_dict = {}
if rec.dob != None and rec.priv_number != None:
org_hash = "{0}_{1}".format(create_hash(rec.priv_number), get_year(rec.dob))
group_hash = create_hash("{0}_{1}".format(create_hash(org_hash), '144C5A0013EDE1B0ACF585'))
rec_hash = group_hash
print("Generate hash for %s rec." % rec.pub_number.pub_number)
else:
rec_hash = '0a'*16
print("There are not enough data to create hash for rec %s." % rec.pub_number.pub_number)
rec_dict.update({'hash': rec_hash, 'pub_number': rec.pub_number.pub_number})
result.append(rec_dict)
return result