0

Say I have a Python script which reads some binary data, packed as int16. I want to convert this data to float32 as fast as possible.

Currently I am doing this, per file

data = np.fromfile(fid, 'int16').astype('float32') 

This has the unfortunate effect that the fromfile and the astype take equally long (several seconds in my case). I was wondering if there's a faster way of doing this?

Maybe initializing a zero array and using np.frombuffer to finally populate two bytes at a time?

Please advise, thanks.

  • That's not "unpacking", it's straight up conversion. There's probably no faster way than the way you're doing it now. How big a file are you reading? – Mark Ransom Jun 22 '23 at 12:50

1 Answers1

1

You can try an alternative approach by reading and converting the data in smaller chunks.

Here's an example :

chunk_size = 1000 # The number of element you want to read
file_size = os.path.getsize(file)

float32_array = np.empty(file_size // 2, dtype=np.float32)
bytes_to_read = chunk_size * 2  # Multiply by 2 since int16 takes 2 bytes
bytes_read = 0

while bytes_read < file_size:
    chunk = np.fromfile(file, dtype=np.int16, count=chunk_size)

    float32_chunk = chunk.astype(np.float32)

    float32_array[bytes_read // 2:bytes_read // 2 + chunk_size] = float32_chunk

    bytes_read += bytes_to_read
Marc Agnetti
  • 76
  • 1
  • 11