I'm currently working doing some basic analysis/trying to make tools to automate some of the more quantitative parts of my job. One of these tasks is analyzing data from local instruments, and using that data to draw quantitative conclusions. The end goal is to calculate percent data coverage over a given region (What percent of values in area 'x' exceed value 'y'?). However, there are problems.
First, the data we are looking at is in binary. While the programmer's guides for the data document some of the data structure, they are very sparse in how to actually utilize the data for analysis outside of their proprietary programs.
Second, I am new to Python. While I tried programming tasks in python years ago, I did not end up making anything useful; I am more adept at shell scripting, can work with html/javascript/php, and managing a program using Fortran; I'm trying to learn Python to diversify.
What I know about the data in question: The binary file contains a 640-character long header made up of three parts. Each part is a a mixture of: characters; unsigned and signed 8, 16, and 32 bit integers; and 16 and 32 bit binary angles. After the header, the files show a cartesian grid of data as 'pixels' in an 'image'. Each 'pixel' within the 'image' is an one-byte unsigned character with a value between 0 and 255. The 'image' is a 2-D grid of 'x by y' with the next 'image' occurring after a given number of bytes (In this data set, the images are 720 by 720 'pixels', so the 'images' are separated after 720^2 bytes).
Right now, my goal is just to read the file into a python program and separate the various "images" for inspection. The initialized data/format are below:
testFile = 'C:/path/to/file/binaryFile'
headerFormat = '640c'
nBytesData = 720 * 720
# Below is commented out
inputFile = open(testfile, 'rb')
I have been able to read the file in as a binary file, but I have no clue how to inspect it. First instinct was to try and put it in a numpy array, but additional research suggested using the struct
module and struct.unpack
to break apart the data. From what I've read, the following block should unpack each 'image' correctly after the initial header, even if it's not the most efficient method:
header_size = struct.calcsize(headerFormat)
testUnpacked = []
with open(testFile, 'rb') as testData:
headerOut = testData.read(header_size)
print("header is: ", headerOut)
while True:
testContent = testData.read()
if not testContent: break
testArray = struct.unpack(testContent, nBytesData)
testUnpacked.append(testArray)
The problem is I do not know how to set up the code to unpack/skip the header to the binary file. I do not think the headerFormat = '640c'
line of code, plus the next couple of commands to try and format its output, correct. I was able to output a line that the program, run in PyCharm, interpreted as the "header", and below is a sample of the output starting from the first 'print':
b'\x1b\x00\x08\x00\x80\xd4\x0f\x00\x00\x00\x00\x00\x1a\x00\x06\x00@\x01\x00\x00\x00\x00\x00\x00\x03\x00\x02\x00\x00\x00\x00\x00}\t\x0
After that, I got a error stating that there is an embedded null character preventing the data from saving to the designated array.
Other questions I referenced to try and figure out how to read the data:
Reading a binary file with python Reading a binary file into a struct Fastest way to read a binary file with a defined format?
Main questions are as follows:
- How do I tell the program to read the binary file header and then start reading the file according to the 720^2 arrays?
- How do I tell the program to save the header in a format I can understand?
- How do I figure out what is causing the struct.error message?