0

There are 360 files with .bin extension which I know that they are 360 raw image files (16-bit grayscale). I guess the size of images is something around 1518x999. I am puzzled how to get image data out of them. I examined them and found that there are 149 bytes repeated at begining of all files and 15 bytes at end of all files (they are marked with white box in below pics). Are these header and footer something common in numpy array? (I see numpy multiarray ... among header bytes. See below pics) Can I extract some information about the image specs like width and height from the header and footer? Here are three examples of the files.

enter image description here enter image description here

IndustProg
  • 627
  • 1
  • 13
  • 33
  • If you are talking about the Numpy file format, then please read: https://numpy.org/devdocs/reference/generated/numpy.lib.format.html . [Here](https://stackoverflow.com/questions/71645404) is a related question. – Jérôme Richard Apr 12 '22 at 15:58

1 Answers1

2

Yes. The header contains information about the type and size of the array.

Using numpy (and pillow), you can easily retrieve the image as follows.

# Using python 3.6 or higher.
# To install numpy and pillow, run: pip3 install numpy pillow

from pathlib import Path
import numpy as np
from PIL import Image

input_dir = Path("./binFiles")  # Directory where *.bin files are stored.
output_dir = Path("./_out")  # Directory where you want to output the image files.

output_dir.mkdir(parents=True, exist_ok=True)
for path in input_dir.rglob("*.bin"):
    buf = np.load(path, allow_pickle=True)
    image = Image.fromarray(buf)
    image.save(output_dir / (path.stem + ".png"))

Here is a sample. (I couldn't upload in original png format, so this is converted one)

enter image description here

EDIT:

Questions

  1. Is there any more information in header than what it was retrieved?
  2. Is there any information in that footer?

Answer

Theoretically, both answers are no.

Your files are actually not in numpy file format, but numpy object in pickle file format. I was able to rebuild the exact matching file using only dtype, shape, order, and an array of 3,032,964 (=999x1518x2) bytes. Thus, numpy or pickle may have added additional metadata, but only those four are the essential information (at least for the three files you provided).

If you want to know about "additional metadata", I don't have an answer for you, you might want to ask a refined new question since this is about pickle file format.

Here is the code I used for checking, in case you might want to check other files as well.

for input_path in input_dir.rglob("*.bin"):
    # Load the original file.
    numpy_array = np.load(input_path, allow_pickle=True)

    # Convert to a byte array. 'A' means keep the order.
    bytes_array = numpy_array.tobytes('A')

    # Make sure there are no additional bytes other than the image pixels.
    assert len(bytes_array) == numpy_array.size * numpy_array.itemsize

    # Rebuild from byte array.
    # Note that rebuilt_array is constructed using only dtype, shape, order,
    # and a byte array matching the image size.
    rebuilt_array = np.frombuffer(
        bytes_array, dtype=numpy_array.dtype
    ).reshape(
        numpy_array.shape, order='F' if np.isfortran(numpy_array) else 'C'
    )

    # Pickle the rebuilt array (mimicking the original file).
    rebuilt_path = output_dir / (input_path.stem + ".pickle")
    with rebuilt_path.open(mode='wb') as fo:
        pickle.dump(rebuilt_array, fo, protocol=4)

    # Make sure there are no additional bytes other than the rebuilt array.
    assert rebuilt_path.read_bytes() == input_path.read_bytes()

    print(f"{input_path.name} passed!")
ken
  • 1,543
  • 1
  • 2
  • 14
  • Well, thank you for your help. I retrieved info as `{'shape': (999, 1518), 'fortran_order': False, 'descr': ' – IndustProg Apr 14 '22 at 06:01
  • I have updated my answer. Hope this is the answer you are looking for. – ken Apr 14 '22 at 10:31
  • Thank you. So what is that footer? It is not likely to have same raw data image at the end of each file! – IndustProg Apr 16 '22 at 04:05
  • I don't know, but since [pickle is a program](https://github.com/python/cpython/blob/f2bc12f0d5297899b57f3fa688b24f3c1d1bee7b/Lib/pickletools.py#L38), I guess it is finishing the construction of the numpy object. It would be the same hooter if it were a numpy object. – ken Apr 16 '22 at 05:38