So, I commented heavily about how unsure I am that it is really the thing to do, and how I suspect XY problem on this.
But, just to give a formal answer to the question, as is, I repeat here what I said in comments:
np.frombuffer(contents, dtype=np.uint8)
Is the way to turn a byte string into a numpy array of bytes (that is of uint8
integers)
The dtype part is important. Since frombuffer does not just iterates bytes to create an array, but expect to find the data representation as is in the buffer. And without the dtype it will try to create an array of float64 from your buffer. Which, 7 times out of 8, will fail because an array of float64 can be represented only by buffers of bytes of len multiple of 8. And if len of contents happen to be multiple of 8, it will succeeds, giving your meaningless floats.
For example, on a .mp4
of mine
with open('out.mp4', 'rb') as f:
content=f.read()
len(content)
# 63047 - it is a very small mp4
x=np.frombuffer(content)
# ValueError: buffer size must be a multiple of element size
x=np.frombuffer(content[:63040])
x.shape
# (7880,)
x.dtype
# np.float64
x[:10]
#array([ 6.32301702e+233, 2.78135139e-309, 9.33260821e-066,
# 1.15681581e-071, 2.78106620e+180, 3.98476928e+252,
# nan, 9.02529811e+042, -3.58729431e+222,
# 1.08615058e-153])
x=np.frombuffer(content, dtype=np.uint8)
x.shape
# (63047,)
x.dtype
# uint8
x[:10]
# array([ 0, 0, 0, 32, 102, 116, 121, 112, 105, 115], dtype=uint8)