0

I'm implementing caching for my program's raw outputs and to make it work with same name files I decided to name the json file as the file's md5 cache. But any file I pass returns the same hash - d41d8cd98f00b204e9800998ecf8427e (same issue with other hashing algorithms)

Here's the code I have:

    def __init__(self, filepath: str):
        self.file = filepath
        self.file_format = self.get_file_format()
        with open(self.file, 'rb') as f:
            self.nbt = nbt.File.load(f, gzipped=True).unpack()

            md5 = hashlib.md5()

            chunk = 0
            while chunk != b'':
                chunk = f.read(1024)
                md5.update(chunk)

            print(md5.hexdigest())

            self.cache = f'../main/schematic_cache/{md5.hexdigest()}.json'
            if not exists(self.cache):
                with open(self.cache, 'w') as j:
                    json.dump({'file': self.file}, j, indent=2)

How can I fix this or is there a better way to store this kind of cache?

Kikugie
  • 21
  • 4
  • ``d41d8cd98f00b204e9800998ecf8427e`` is the md5 of an empty byte string. You are not actually updating the hash, which indicates no data is read from the file. – MisterMiyagi Mar 06 '22 at 13:56
  • Does ``nbt.File.load`` consume the file content? Files are streams, reading them once "exhausts" them unless reset to their beginning. – MisterMiyagi Mar 06 '22 at 13:58
  • Does this answer your question? [Why can't I call read() twice on an open file?](https://stackoverflow.com/questions/3906137/why-cant-i-call-read-twice-on-an-open-file) – MisterMiyagi Mar 06 '22 at 14:19

1 Answers1

0

As answered in the question comments by MisterMiyagi:

d41d8cd98f00b204e9800998ecf8427e is the md5 of an empty byte string. You are not actually updating the hash, which indicates no data is read from the file.

Files are streams, reading them once "exhausts" them unless reset to their beginning. Why can't I call read() twice on an open file?

Kikugie
  • 21
  • 4