2

I have a binary file as an output of an analytical device. I know it contains all the data I need. I'm trying to extract them from the file.

With the help of this question: How to view files in binary in the terminal?

I opened the file with Vim, and switched to binary editing. I can now browse the binary file. Some parts seem pretty readable:

00000340: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000350: 0000 0000 0000 0000 0000 1a4c 0061 0062  ...........L.a.b
00000360: 0072 0061 0073 006f 006c 0020 0037 0030  .r.a.s.o.l. .7.0
00000370: 0067 004c 0020 0066 006c 0075 006f 0072  .g.L. .f.l.u.o.r
00000380: 0065 0073 0063 0065 0069 006e 0065 0000  .e.s.c.e.i.n.e..
00000390: 0000 0000 0000 0000 0000 0000 0000 0000  ................

But some don't:

00001000: 4300 ea00 4b00 0000 d80e 401f 2800 5100  C...K.....@.(.Q.
00001010: 0400 0000 0000 6e03 36fe eaff b000 9cff  ......n.6.......
00001020: 71ff e500 0eff f9ff 4aff 1200 2cff c400  q.......J...,...
00001030: 6f00 6bff 0d00 c4ff f1ff fdff d9ff 6b00  o.k...........k.
00001040: f8ff 1c00 5400 34ff a600 deff feff beff  ....T.4.........
00001050: 1600 acff f5ff ffff 7600 39ff 5e00 9700  ........v.9.^...
00001060: 2a00 92ff 3300 94ff 5200 a2ff 6100 afff  *...3...R...a...
00001070: b9ff 3500 a1ff 2300 f6ff a000 f9fe ef00  ..5...#.........
00001080: c5ff 6000 2100 53ff 9200 8cff 9200 a0ff  ..`.!.S.........
00001090: 5d00 b0ff 8eff 8b00 30ff 0d01 adff 0300  ].......0.......
000010a0: 26ff ae00 cfff c000 6900 a2fe cc00 dfff  &.......i.......
000010b0: fdff 4fff b900 f0ff ba00 cdfe 2a00 3400  ..O.........*.4.
000010c0: 7cff f800 56ff c7ff 8100 3300 f7fe 6cff  |...V.....3...l.
000010d0: c500 3a00 0600 0500 8600 3800 56ff 1bff  ..:.......8.V...

I would like to know if I can extract the data in a structured and clear way. So I have several questions, I don't really know where to start:

  • If I can read clearly some text, will I be able to read the other data in the file ?
  • How do I parse the test into an usable way ? I

I know my question is a bit unclear. I mainly need a starting point.

I'm comfortable in using Python and Bash for this task.

Here is the start of the file:

00000000: 0331 3331 0000 0000 0000 0000 0000 0000  .131............
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000f0: 0000 0000 0000 0005 0000 0083 0001 0005  ................
00000100: 0001 0001 0010 2232 0000 0009 0000 0000  ......"2........
00000110: 0000 0000 0000 0000 1195 0000 0000 0000  ................
00000120: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000140: 1000 0000 0001 0331 0033 0031 0000 0000  .......1.3.1....
00000150: 0000 0000 0000 0001 0000 000c 4c00 4300  ............L.C.
00000160: 2000 4400 4100 5400 4100 2000 4600 4900   .D.A.T.A. .F.I.
00000170: 4c00 4500 0000 0000 0000 0000 0000 0000  L.E.............
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

EDIT:

Ok, now I know more about binary files. I'll try to improve my question.

Now, I know binary files are encoded with a structure (a certain formatting). So, to decode it, you must know the structure of the data. What I don't understand is that, for a certain portion of the binary file, the characters seem erratic. But the beginning of the file is completely readable. Why ? How can you try to discover the structure if you can't read the file correctly ?

Community
  • 1
  • 1
JPFrancoia
  • 4,866
  • 10
  • 43
  • 73
  • try the command `strings file` or even `strings -e encoding file` where encoding could be someting like utf16 or similar. – JJoao Sep 15 '15 at 12:09
  • There are clearly UTF-16 big endian strings in there. However I would believe the data of interest *wouldn't* be in these text strings however. It is rather impossible to decode if the format *or* expected data is not known – Antti Haapala -- Слава Україні Sep 15 '15 at 12:14
  • @JJoao: the -e options doesn't support the encoding: -e --encoding={s,S,b,l,B,L} Select character size and endianness: s = 7-bit, S = 8-bit, {b,l} = 16-bit, {B,L} = 32-bit – JPFrancoia Sep 15 '15 at 12:18
  • @AnttiHaapala: I think they are. It is a reasonnable guess seen the size of the file. None is as bigger in the corresponding directory. I don't know what is the format, but I do know however what are the data. – JPFrancoia Sep 15 '15 at 12:20
  • @Rififi so this would be `--encoding=b` – Antti Haapala -- Слава Україні Sep 15 '15 at 12:25
  • Ok. In that case, the CLI returns nothing. – JPFrancoia Sep 15 '15 at 12:26
  • @Rififi, sorry I was not clear: as Antti Haala said `strings -e b` or `strings -e l` would be my guesses for UTF-16 encoded. – JJoao Sep 15 '15 at 14:07
  • @Rififi, are you trying to extract the strings, or also the values of the binary data? Could you show us the start of the file? – JJoao Sep 15 '15 at 14:13
  • 1
    I'm not sure I really understand what you are asking. I think I want to extract the strings. I want the original data, encoded in binary, if it's more clear. I edited my question and added the start of the file. – JPFrancoia Sep 15 '15 at 14:55

1 Answers1

0

The question has probably lost its relevance to its owner during this period. But still, needs a little touch to give an idea to find solutions in similar situations.

Any kind of binary file can be parsed into meaningful data, as long as one knows the structure used in the creation of the file. Then what is needed is to use a parser in any language known to read file content to get data. If there is no suitable kind of parser then it is needed to learn how to parse that data using file tools. if needed, seek for the words like "offset" and "seek" in file operations.

If the structure is not known, but the program is available to the user, it can be used to create new data files with small changes in the data itself, such as changing a character in a name or increasing a value by 1. Then these new files can be compared (as binary) to find which bytes have been changed. In these cases, it is not needed to map all data blocks. Instead, it is enough to know essential blocks only. After that, the rest is the same as described in above paragraph.

An example file structure: Let say you have written your name, your age, and your weight into a file. If I write it to a file as text data content will be "YILMAZ4078.5" or "YILMAZ( B" in binary mode. In text mode, it is easy to construct a structure, yet binary file needs more elegant touch, which is a long topic on its own. it should suffice to say counting the bytes are needed to if the block is an integer or a real number etc.

Yılmaz Durmaz
  • 2,374
  • 12
  • 26