0

I am reading many ~MB sized binary files into lists of unsigned ints with python 2.7. To achieve this I use struct.unpack:

# read file
f = open(filename, 'rb')
raw_data = f.read()
f.close()

# convert raw data to unsigned shorts
data_us = list(struct.unpack('H'*(len(raw_data)/2), raw_data))

The problem is the last line runs ~5X slower, once every 6 or 7 iterations.

I am running all the code in a jupyter notebook hosted by a remote machine on the same network. The machines are running different Linux distributions (Arch for open notebook and hosted server on Fedora), and I have not encountered this behavior with any other code.

I have tried changing the line slightly:

data_us = [ struct.unpack('H', raw_data[2*i:2*(i+1)])[0] for i in range(len(raw_data)/2) ]

but this did nothing.

The files are not all exactly the same length, but I have seen no connection to file size.

The one thing that fixed this problem was using %timeit on the whole function, but calling %timeit inside production code is not a good solution.

I can live with this issue if there is no good solution, but more than anything I'm curious of the cause. Any insight would be greatly appreciated.

This issue does not show up when the code is run in a terminal from a source file.

  • Check out solution in this to see if it helps: https://stackoverflow.com/questions/36797088/speed-up-pythons-struct-unpack – jmdatasci May 16 '19 at 19:03
  • 1
    @JMeyer Data Science Converting the data with `numpy.ndarray` does not fix the strange slower iteration issue, and it takes longer than `struct.unpack` in general anyway. But thanks for the response. – Jonah Hoffman May 16 '19 at 19:45
  • Have you tried running the same code outside the notebook? What makes you sure that's related? – Iguananaut May 16 '19 at 20:09
  • @Iguananaut When I run the same code in a source file from the terminal I do not see this issue. I tried this on the same machine that hosts the notebook server. – Jonah Hoffman May 16 '19 at 21:12

0 Answers0