0

I have an about 537 byte file which holds an image in some proprietary format. All I know is, that it is pure binary data consists of 4 byte floats (each voxel is a 4 byte float with a density value).

If I simply open the file via open()

with open(filename, 'rb') as f:
s = f.read()

how can I iterate over the file and print the voxel values? It I simple use

print(s) 

I get an error "IOPub data rate exceeded." Of course, I can fix this with, e.g.,

jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000

Then I get something like this

b'\x00\x00\x00\x00\x00\x00\x00\x00_2w:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcd\xdb\xcd:\n0\x12:\x00\x00\x00\x00V\t<:\xf1\xb2\x06;\x8f\xeb\x9c:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x91\x14\x8c:\x00\x00\x00\x00\x00\x00\x00\x00sR\xbb9\x9e?\x8f:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xc6\xb1^9\x00\x00\x00\x00\x00\x00\x00\x000\xcd\xd49\xc5bO:\xe0\xa9\x849\xf7\x05\x0f::\xb6\x93:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00q\xdbb9\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 \xf2\x11:\xd3eN9\xa5OQ9\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00f\x85~7\x00\x00\x00\x00\x00\x00\x00\x00\x8fV\x959d\x98U:\x00\x00\x00\x00>\xa3\x8d8\x00\x00\x00\x00\xe4\x07~:\x00\x00\x00\x00\x00\x00\x00\x00\x13\xc0b9\x00\x00\x00\x00\xdb \r:,3\xf1:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00v\xfb\xc49\x9b\x10g:\xfa-\xc7:\xd8j\x86:G\x08\x19:\x00\x00\x00\x00\x83\xc88:\x86Xs9\x1a-\x8f9\xf3\xc1\x00;\xf4\x85I:\x8e\x0f\xeb9\xceP\xb0:x@\xb9:\xe0\x02Z;\xef\xc1,:\xdd\xb8\xa8:\xd5\x94\xc1:\x96EG:L\xf0_:\x00\x00\x00\x00\x00\x00\x00\x00\x153\x00;8\xf2\xce:H\x00\x82:\x8f\xae\xe4:V\xe6\xe6:\x00

What I really want is just a list of my 4 byte floats.

Any ideas?

All the best!

  • does this help you ? https://stackoverflow.com/questions/519633/lazy-method-for-reading-big-file-in-python – AcaNg Sep 13 '21 at 17:20
  • 1
    "All I know is, that it is pure binary data consists of 4 byte floats" That's *all* there is? And you want to just get a list of all the values, however many there are? – Karl Knechtel Sep 13 '21 at 17:25
  • Are you specifically asking about doing the data conversion? Or do you need to handle a very large file? You said "about 537 byte file" which is tiny, but maybe you meant 537 gigabyte or something? – Karl Knechtel Sep 13 '21 at 17:26

4 Answers4

1

To process binary data in to arbitrary formats you will need the struct module.

Specifically, you will want to use the unpack_from function, or iter_unpack if the file is really big and you want to go over the floats one by one.

Something like this:

import struct

with open(filename, 'rb') as f:
    s = f.read()

for f in struct.iter_unpack('f', s):
    print(f[0])

Depending on the format of your file and the platform you are running on, you may need to explicitly specify endianess using < or >.

Lev M.
  • 6,088
  • 1
  • 10
  • 23
  • This seems to work, I get something like "0.0 0.0 0.0009429808123968542 0.0 0.0 0.0 0.0015705764526501298 0.0005576616385951638 0.0 0.0007173022022470832 0.00205534347333014 0.0011972057400271297 0.0 0.0 0.0" But how does it know that I have this 4 byte floats? – Sebastian Sep 13 '21 at 18:10
  • @Sebastian that is what the 'f' parameter to `iter_unpack` is for. It tells it to parse every 4 bytes as a float. Read the documentation page I linked, it specifies all the options and how it can parse arrays of bytes. – Lev M. Sep 13 '21 at 20:22
1

The array module can reinterpret raw data bytes. the 'f' type converts to 32-bit floats:

import array

with open(filename, 'rb') as f:
    s = f.read()

f = array.array('f', s)
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
1

with open(filename, "rb") as file: data = np.frombuffer(file.read(), np.float32)

does the trick for me

-1

You could read it into numpy which is an efficient way to store binary numbers.

import numpy as np
arr = np.fromfile(filename, dtype="float32")
tdelaney
  • 73,364
  • 6
  • 83
  • 116