I have a binary file as an output of an analytical device. I know it contains all the data I need. I'm trying to extract them from the file.
With the help of this question: How to view files in binary in the terminal?
I opened the file with Vim, and switched to binary editing. I can now browse the binary file. Some parts seem pretty readable:
00000340: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000350: 0000 0000 0000 0000 0000 1a4c 0061 0062 ...........L.a.b
00000360: 0072 0061 0073 006f 006c 0020 0037 0030 .r.a.s.o.l. .7.0
00000370: 0067 004c 0020 0066 006c 0075 006f 0072 .g.L. .f.l.u.o.r
00000380: 0065 0073 0063 0065 0069 006e 0065 0000 .e.s.c.e.i.n.e..
00000390: 0000 0000 0000 0000 0000 0000 0000 0000 ................
But some don't:
00001000: 4300 ea00 4b00 0000 d80e 401f 2800 5100 C...K.....@.(.Q.
00001010: 0400 0000 0000 6e03 36fe eaff b000 9cff ......n.6.......
00001020: 71ff e500 0eff f9ff 4aff 1200 2cff c400 q.......J...,...
00001030: 6f00 6bff 0d00 c4ff f1ff fdff d9ff 6b00 o.k...........k.
00001040: f8ff 1c00 5400 34ff a600 deff feff beff ....T.4.........
00001050: 1600 acff f5ff ffff 7600 39ff 5e00 9700 ........v.9.^...
00001060: 2a00 92ff 3300 94ff 5200 a2ff 6100 afff *...3...R...a...
00001070: b9ff 3500 a1ff 2300 f6ff a000 f9fe ef00 ..5...#.........
00001080: c5ff 6000 2100 53ff 9200 8cff 9200 a0ff ..`.!.S.........
00001090: 5d00 b0ff 8eff 8b00 30ff 0d01 adff 0300 ].......0.......
000010a0: 26ff ae00 cfff c000 6900 a2fe cc00 dfff &.......i.......
000010b0: fdff 4fff b900 f0ff ba00 cdfe 2a00 3400 ..O.........*.4.
000010c0: 7cff f800 56ff c7ff 8100 3300 f7fe 6cff |...V.....3...l.
000010d0: c500 3a00 0600 0500 8600 3800 56ff 1bff ..:.......8.V...
I would like to know if I can extract the data in a structured and clear way. So I have several questions, I don't really know where to start:
- If I can read clearly some text, will I be able to read the other data in the file ?
- How do I parse the test into an usable way ? I
I know my question is a bit unclear. I mainly need a starting point.
I'm comfortable in using Python and Bash for this task.
Here is the start of the file:
00000000: 0331 3331 0000 0000 0000 0000 0000 0000 .131............
00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000000f0: 0000 0000 0000 0005 0000 0083 0001 0005 ................
00000100: 0001 0001 0010 2232 0000 0009 0000 0000 ......"2........
00000110: 0000 0000 0000 0000 1195 0000 0000 0000 ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000140: 1000 0000 0001 0331 0033 0031 0000 0000 .......1.3.1....
00000150: 0000 0000 0000 0001 0000 000c 4c00 4300 ............L.C.
00000160: 2000 4400 4100 5400 4100 2000 4600 4900 .D.A.T.A. .F.I.
00000170: 4c00 4500 0000 0000 0000 0000 0000 0000 L.E.............
00000180: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................
EDIT:
Ok, now I know more about binary files. I'll try to improve my question.
Now, I know binary files are encoded with a structure (a certain formatting). So, to decode it, you must know the structure of the data. What I don't understand is that, for a certain portion of the binary file, the characters seem erratic. But the beginning of the file is completely readable. Why ? How can you try to discover the structure if you can't read the file correctly ?