You didn't specify to open the file in binary mode, so f.read()
is trying to read the file as a UTF-8-encoded text file, which doesn't seem to be working. But since we take the hash of bytes, not of strings, it doesn't matter what the encoding is, or even whether the file is text at all: just open it, and then read it, as a binary file.
>>> with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
Traceback (most recent call last):
File "<ipython-input-3-fdba09d5390b>", line 1, in <module>
with open("test.h5.bz2","r") as f: print(hashlib.sha1(f.read()).hexdigest())
File "/home/dsm/sys/pys/Python-3.5.1-bin/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb8 in position 10: invalid start byte
but
>>> with open("test.h5.bz2","rb") as f: print(hashlib.sha1(f.read()).hexdigest())
21bd89480061c80f347e34594e71c6943ca11325