36

I would love some help with a Bash script loop that will show all the differences between two binary files, using just

cmp file1 file2 

It only shows the first change I would like to use cmp because it gives a offset an a line number of where each change is but if you think there's a better command I'm open to it :) thanks

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Lewis Denny
  • 659
  • 3
  • 8
  • 19
  • The offset is valid, but the line number will not be valid when comparing binary files, as they have no concept of lines (only text have lines). – Some programmer dude Dec 05 '11 at 13:01
  • Yeah I understand, in this case I use the line number to reference to a hexdump of the binary so I read whats around the different offset :) – Lewis Denny Dec 05 '11 at 13:11

3 Answers3

45

I think cmp -l file1 file2 might do what you want. From the manpage:

-l  --verbose
      Output byte numbers and values of all differing bytes.

The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:

4531  66  63
4532  63  65
4533  64  67
4580  72  40
4581  40  55
[...]

So the first difference is at offset 4531, where file1's decimal octal byte value is 66 and file2's is 63.

fdermishin
  • 3,519
  • 3
  • 24
  • 45
rwos
  • 1,751
  • 1
  • 15
  • 18
  • 4
    +1: this is 'the way to do it', but the problem with it is that `cmp` does not look for inserted or deleted material; it just checks 'if the byte at offset N in file1 the same as the byte at offset N in file2; if yes, then print nothing, else print difference'. So the files have to be very similar (eg, just some bytes in the Unix timestamp when the object files were compiled - which is built into some object files) but the rest needs to be the same. Add 3 bytes to a constant string and everything after that is different. – Jonathan Leffler Dec 05 '11 at 15:39
  • Thanks heaps this is just what I wanted, i try that in the past but I did know the the numbers on the side where the offsets :) Thanks heaps! – Lewis Denny Dec 05 '11 at 20:14
  • 2
    I've edited the answer by add a correction about format of the bytes that differ. This is a not so well documented feature of cmp. I hope that the edit is appropriate. – fdermishin Feb 07 '21 at 21:52
6

Method that works for single byte addition/deletion

diff <(od -An -tx1 -w1 -v file1) \
     <(od -An -tx1 -w1 -v file2)

Generate a test case with a single removal of byte 64:

for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2

Output:

64d63
<  40

If you also want to see the ASCII version of the character:

bdiff() (
  f() (
    od -An -tx1c -w1 -v "$1" | paste -d '' - -
  )
  diff <(f "$1") <(f "$2")
)

bdiff file1 file2

Output:

64d63
<   40   @

Tested on Ubuntu 16.04.

I prefer od over xxd because:

  • it is POSIX, xxd is not (comes with Vim)
  • has the -An to remove the address column without awk.

Command explanation:

  • -An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
  • -w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
  • -tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
  • -v prevents asterisk repetition abbreviation * which might interfere with the diff
  • paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
  • we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: How to define a function inside another function in Bash?

See also:

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • This has the inherent flaw that it will not stream the data but load everything into RAM, meaning you will need at least 2 - 3 times the size of the files as memory, which most binary diff tools use. The only one I found that doesn't behave like this is xdelta3... – Izzy Nov 09 '17 at 11:58
  • @Izzy add it to an answer showing to use it and why and get upvotes :-) – Ciro Santilli OurBigBook.com Nov 09 '17 at 12:03
  • Sadly, to my knowledge, it can't. At least not the kind you'd expect. It produces VCDIFF output, which is a highly compressed binary delta. So you can just diff, patch and few the command structure. My comment was more of a "be aware that this answer will blow your main memory with a 5GB file" – Izzy Nov 14 '17 at 08:48
  • @Izzy OK! Good to know nevertheless. – Ciro Santilli OurBigBook.com Nov 14 '17 at 08:50
3

The more efficient workaround I've found is to translate binary files to some form of text using od.

Then any flavour of diff works fine.

mouviciel
  • 66,855
  • 13
  • 106
  • 140
  • Yep, it really depends on what the OP wants to do with the diff. A diff of a hexdump is probably of more value for humans, while a `cmp` may be easier for programs to parse/use. – rwos Dec 05 '11 at 16:03