I'm trying to calculate hash for files to check if any changes are made. i have Gui and some other observers running in the event loop. So, i decided to calculate hash of files [md5/Sha1 which ever is faster] asynchronously.
Synchronous code :
import hashlib
import time
chunk_size = 4 * 1024
def getHash(filename):
md5_hash = hashlib.md5()
with open(filename, "rb") as f:
for byte_block in iter(lambda: f.read(chunk_size), b""):
md5_hash.update(byte_block)
print("getHash : " + md5_hash.hexdigest())
start = time.time()
getHash("C:\\Users\\xxx\\video1.mkv")
getHash("C:\\Users\\xxx\\video2.mkv")
getHash("C:\\Users\\xxx\\video3.mkv")
end = time.time()
print(end - start)
Output of synchronous code : 2.4000535011291504
Asynchronous code :
import hashlib
import aiofiles
import asyncio
import time
chunk_size = 4 * 1024
async def get_hash_async(file_path: str):
async with aiofiles.open(file_path, "rb") as fd:
md5_hash = hashlib.md5()
while True:
chunk = await fd.read(chunk_size)
if not chunk:
break
md5_hash.update(chunk)
print("get_hash_async : " + md5_hash.hexdigest())
async def check():
start = time.time()
t1 = get_hash_async("C:\\Users\\xxx\\video1.mkv")
t2 = get_hash_async("C:\\Users\\xxx\\video2.mkv")
t3 = get_hash_async("C:\\Users\\xxx\\video3.mkv")
await asyncio.gather(t1,t2,t3)
end = time.time()
print(end - start)
loop = asyncio.get_event_loop()
loop.run_until_complete(check())
Output of asynchronous code : 27.957366943359375
am i doing it right? or, are there any changes to be made to improve the performance of the code?
Thanks in advance.