I am reading many ~MB sized binary files into lists of unsigned ints with python 2.7. To achieve this I use struct.unpack
:
# read file
f = open(filename, 'rb')
raw_data = f.read()
f.close()
# convert raw data to unsigned shorts
data_us = list(struct.unpack('H'*(len(raw_data)/2), raw_data))
The problem is the last line runs ~5X slower, once every 6 or 7 iterations.
I am running all the code in a jupyter notebook hosted by a remote machine on the same network. The machines are running different Linux distributions (Arch for open notebook and hosted server on Fedora), and I have not encountered this behavior with any other code.
I have tried changing the line slightly:
data_us = [ struct.unpack('H', raw_data[2*i:2*(i+1)])[0] for i in range(len(raw_data)/2) ]
but this did nothing.
The files are not all exactly the same length, but I have seen no connection to file size.
The one thing that fixed this problem was using %timeit
on the whole function, but calling %timeit
inside production code is not a good solution.
I can live with this issue if there is no good solution, but more than anything I'm curious of the cause. Any insight would be greatly appreciated.
This issue does not show up when the code is run in a terminal from a source file.