266

What is the easiest way (using a graphical tool or command line on Ubuntu Linux) to know if two binary files are the same or not (except for the time stamps)? I do not need to actually extract the difference. I just need to know whether they are the same or not.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
sawa
  • 165,429
  • 45
  • 277
  • 381
  • 7
    A question asking to show *how* they differ: http://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux – Ciro Santilli OurBigBook.com Apr 04 '15 at 19:17
  • 4
    The man page for `cmp` specifically says it does a byte by byte comparison so that is my default for 2 binary files. `diff` is line by line and will give you the same Yes/No answer but of course not the same dump to the standard out stream. If the lines are long because perhaps they are not text files then I would prefer `cmp`. `diff` has the advantage that you can specify a comparison of directories and the `-r` for recursion thereby comparing multiple files in one command. – H2ONaCl Dec 24 '16 at 08:07

15 Answers15

272

The standard unix diff will show if the files are the same or not:

[me@host ~]$ diff 1.bin 2.bin
Binary files 1.bin and 2.bin differ

If there is no output from the command, it means that the files have no differences.

Michael Oryl
  • 20,856
  • 14
  • 77
  • 117
Joe
  • 41,484
  • 20
  • 104
  • 125
  • 8
    diff seems to have problems with *really large* files. I got a `diff: memory exhausted` when comparing two 13G files. – Yongwei Wu Sep 28 '16 at 08:45
  • 1
    Interesting output. `diff` is telling you they are "binary" fies. Since all files can be considered to be binary that's a strange assertion. – H2ONaCl Dec 24 '16 at 08:13
  • 23
    You can report identical files with option: `diff -s 1.bin 2.bin` or `diff --report-identical-files 1.bin 2.bin` This shows `Files 1.bin and 2.bin are identical` – Tom Kuschel Jul 20 '17 at 10:44
  • Use `-s` to get some output if files are identical. E.g. `diff -s 1.bin 2.bin` – Snowcrash Feb 16 '18 at 10:56
  • 1
    No, it will say that they are "differ", so they are not the same – Josef Klimuk Mar 20 '18 at 13:31
  • 2
    I have two executables, I know they are different because I compiled and ran them, but all options of diff and cmp given here judge them identical. Why? !!! – mirkastath Feb 28 '19 at 02:14
  • @YongweiWu, the "problem" is with the memory in the system, not `diff` which uses copious amounts for speed. Just throw some more DDR in there. – Dominic Cerisano Mar 20 '19 at 07:18
  • In this example, the files are different. The command is telling you that they differ. You can run "cmp 1.bin 2.bin" and it will tell you where the first difference is located – ritmatter Sep 18 '19 at 21:39
  • re "no ouput": From the diff man page: "Exit status is 0 if inputs are the same, 1 if different, 2 if trouble." Using the exit status in $? could make it easier to check things in bash, like: if diff 1.bin 2.bin; then echo "same"; else echo "different"; fi – jgreve Dec 19 '19 at 18:19
146

I found Visual Binary Diff was what I was looking for, available on:

  • Ubuntu:

    sudo apt install vbindiff
    
  • Arch Linux:

    sudo pacman -S vbindiff
    
  • Mac OS X via MacPorts:

    port install vbindiff
    
  • Mac OS X via Homebrew:

    brew install vbindiff
    
Paco
  • 81
  • 1
  • 9
shao.lo
  • 4,387
  • 2
  • 33
  • 47
  • 4
    Nice... I /thought/ I only wanted to know whether the files differed; but being able to see the exact differences easily was a lot more useful. It tended to segfault when I got to the end of the file, but never mind, it still worked. – Jeremy Oct 28 '16 at 02:42
  • 3
    It's been said a few times, but this is a great little program! (fyi also on homebrew) – johncip Feb 19 '17 at 22:59
  • 3
    This should be the accepted answer as it's a far superior method than the bland and unhelpful output of the canonical diff command. – Gearoid Murphy Nov 07 '18 at 00:20
  • 2
    This is the best tool for binary diff. – Carla Camargo Jun 03 '19 at 13:22
142

Use cmp command. This will either exit cleanly if they are binary equal, or it will print out where the first difference occurs and exit.

kenorb
  • 155,785
  • 88
  • 678
  • 743
bobjandal
  • 2,313
  • 2
  • 15
  • 8
  • 12
    For the use case the OP describes IMHO `cmp` is more efficient than `diff`. So I'd prefer this. – halloleo Dec 18 '13 at 05:41
  • 6
    I have a shell script that runs: `cmp $1 $2 && echo "identical" || echo "different"` – steveha Dec 14 '14 at 02:01
  • 3
    does the cmp stop when it found the first difference, and display it or it goes through the end of the files? – sop Oct 25 '16 at 08:10
  • 1
    `cmp` has "silent" mode: `-s, --quiet, --silent` - `suppress all normal output`. I didn't test yet but I think that it will stop at the first difference if there is one. – Victor Yarema Nov 22 '16 at 05:21
  • 1
    I checked it right now for `cmp (GNU diffutils) 3.7`. As already stated in the answer, `cmp` **stops at the first difference** and specifies it like this: `file1 file2 differ: char 14, line 1`. – Wolf Dec 02 '21 at 19:01
20

I ended up using hexdump to convert the binary files to there hex representation and then opened them in meld / kompare / any other diff tool. Unlike you I was after the differences in the files.

hexdump tmp/Circle_24.png > tmp/hex1.txt
hexdump /tmp/Circle_24.png > tmp/hex2.txt

meld tmp/hex1.txt tmp/hex2.txt
simotek
  • 737
  • 6
  • 10
  • 2
    Use `hexdump -v -e '/1 "%02x\n"'` if you want to diff and see exactly which bytes were inserted or removed. – William Entriken Mar 17 '17 at 21:13
  • 1
    Meld also works with binary files when they aren't converted to hex first. It shows hex values for things which aren't in the char set, otherwise normal chars, which is useful with binary files that also contain some ascii text. Many do, at least begin with a magic string. – Felix Dombek Jul 12 '19 at 11:20
19

Use sha1 to generate checksum:

sha1 [FILENAME1]
sha1 [FILENAME2]
Scott Presnell
  • 1,528
  • 10
  • 23
19

You can use MD5 hash function to check if two files are the same, with this you can not see the differences in a low level, but is a quick way to compare two files.

md5 <filename1>
md5 <filename2>

If both MD5 hashes (the command output) are the same, then, the two files are not different.

galoget
  • 722
  • 9
  • 15
Rikki
  • 1,142
  • 15
  • 17
  • 10
    Can you explain your down votes please? SHA1 has 4 upvotes, and if the OP thinks there's a chance the two files could be the same or similar, the chances of a collision are slight and not worthy of down voting MD5 but up voting SHA1 other than because you heard you should hash your passwords with SHA1 instead of MD5 (that's a different problem). – Rikki Jan 16 '16 at 01:10
  • 3
    not sure about the reason but a pure cmp will be more efficient than computing any hash function of files and comparing them (at least for only 2 files) – Paweł Szczur Apr 26 '16 at 13:58
  • 1
    if the two files are large and on the same disk (not ssd), the md5 or sha* variant might be faster because the disks can read the two files sequentially which saves lots of head movements – Daniel Alder Feb 22 '17 at 20:08
  • 9
    I downvoted because you posted a minor variant of an earlier (bad) solution, when it should have been a comment. – johncip Mar 06 '17 at 10:07
  • 1
    The quickest way to check large files :) Thanks a lot – Sumeet Patil Feb 03 '21 at 15:12
  • if won't always work, there are md5 collisions. – MichaelMoser Dec 29 '21 at 05:00
  • @MichaelMoser indeed, but given OP's use case, it seems like they are trying to determine if two files they think are the same are exactly the same. The likelihood of them happening to compare two files they think are the same, turning out to be so completely different that they collide on md5 hash is slim. – Rikki May 09 '23 at 04:30
9

Try diff -s

Short answer: run diff with the -s switch.

Long answer: read on below.


Here's an example. Let's start by creating two files with random binary contents:

$ dd if=/dev/random bs=1k count=1 of=test1.bin
1+0 records in
1+0 records out
1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0100332 s, 102 kB/s

                                                                                  
$ dd if=/dev/random bs=1k count=1 of=test2.bin
1+0 records in
1+0 records out
1024 bytes (1,0 kB, 1,0 KiB) copied, 0,0102889 s, 99,5 kB/s

Now let's make a copy of the first file:

$ cp test1.bin copyoftest1.bin

Now test1.bin and test2.bin should be different:

$ diff test1.bin test2.bin
Binary files test1.bin and test2.bin differ

... and test1.bin and copyoftest1.bin should be identical:

$ diff test1.bin copyoftest1.bin

But wait! Why is there no output?!?

The answer is: this is by design. There is no output on identical files.

But there are different error codes:

$ diff test1.bin test2.bin
Binary files test1.bin and test2.bin differ

$ echo $?
1


$ diff test1.bin copyoftest1.bin

$ echo $?
0

Now fortunately you don't have to check error codes each and every time because you can just use the -s (or --report-identical-files) switch to make diff be more verbose:

$ diff -s test1.bin copyoftest1.bin
Files test1.bin and copyoftest1.bin are identical
Community
  • 1
  • 1
StackzOfZtuff
  • 2,534
  • 1
  • 28
  • 25
6

Use cmp command. Refer to Binary Files and Forcing Text Comparisons for more information.

cmp -b file1 file2
user2008151314
  • 680
  • 6
  • 10
  • 1
    `-b` doesn't compare files in "binary mode". It actually "With GNU `cmp`, you can also use the `-b` or `--print-bytes` option to show the ASCII representation of those bytes.". This is exactly what I found using URL to manual that you have provided. – Victor Yarema Nov 22 '16 at 05:28
  • Victor Yarema, I don't know what you mean by "binary mode". `cmp` is inherently a binary comparison in my opinion. The `-b` option merely prints the first byte that is different. – H2ONaCl Dec 24 '16 at 08:25
6

Diff with the following options would do a binary comparison to check just if the files are different at all and it'd output if the files are the same as well:

diff -qs {file1} {file2}

If you are comparing two files with the same name in different directories, you can use this form instead:

diff -qs {file1} --to-file={dir2}

OS X El Capitan

Dima Korobskiy
  • 1,479
  • 16
  • 26
4

For finding flash memory defects, I had to write this script which shows all 1K blocks which contain differences (not only the first one as cmp -b does)

#!/bin/sh

f1=testinput.dat
f2=testoutput.dat

size=$(stat -c%s $f1)
i=0
while [ $i -lt $size ]; do
  if ! r="`cmp -n 1024 -i $i -b $f1 $f2`"; then
    printf "%8x: %s\n" $i "$r"
  fi
  i=$(expr $i + 1024)
done

Output:

   2d400: testinput.dat testoutput.dat differ: byte 3, line 1 is 200 M-^@ 240 M- 
   2dc00: testinput.dat testoutput.dat differ: byte 8, line 1 is 327 M-W 127 W
   4d000: testinput.dat testoutput.dat differ: byte 37, line 1 is 270 M-8 260 M-0
   4d400: testinput.dat testoutput.dat differ: byte 19, line 1 is  46 &  44 $

Disclaimer: I hacked the script in 5 min. It doesn't support command line arguments nor does it support spaces in file names

Daniel Alder
  • 5,031
  • 2
  • 45
  • 55
  • I get "r: not found" (using GNU linux) – unseen_rider Feb 03 '17 at 04:39
  • @unseen_rider which shell, which line? Please call the script using `sh -x` for debugging – Daniel Alder Feb 04 '17 at 12:20
  • This is via calling the script from terminal. Line is 9. – unseen_rider Feb 04 '17 at 20:56
  • @unseen_rider I can't help you this way. The script is ok. Please post your debug output to http://pastebin.com/. You can see here what I mean: http://pastebin.com/8trgyF4A. Also, please tell me the output of `readlink -f $(which sh) ` – Daniel Alder Feb 05 '17 at 12:33
  • The last command gives `/bin/dash`. Currently creating paste on pastebin. – unseen_rider Feb 06 '17 at 02:39
  • Ok there was an extra space that I entered after the `r` on the second line. Removing this resolved it. Thanks – unseen_rider Feb 06 '17 at 03:11
  • I now have a different problem - the output of the tool says the 2 files are different in `byte 316, line 5 is 71 9 145 e`, however manual inspection of the files in gedit on line 5 shows no differences. Why is this? – unseen_rider Feb 06 '17 at 03:22
  • @unseen_rider This is because the whole script is a dirty hack (see disclaimer): Each diff line shows the unmodified output of the diff tool for the extracted area. This means, the difference is relative to the examined offset on the left: `4d000: byte 37` from my example means 0x4d000+37=address 0x4D025. If you only want to see the first diff (with correct address) of the whole file, I recommend to use the basic variant: `cmp -b testinput.dat testoutput.dat` – Daniel Alder Feb 08 '17 at 15:12
2
md5sum binary1 binary2

If the md5sum is same, binaries are same

E.g

md5sum new*
89c60189c3fa7ab5c96ae121ec43bd4a  new.txt
89c60189c3fa7ab5c96ae121ec43bd4a  new1.txt
root@TinyDistro:~# cat new*
aa55 aa55 0000 8010 7738
aa55 aa55 0000 8010 7738


root@TinyDistro:~# cat new*
aa55 aa55 000 8010 7738
aa55 aa55 0000 8010 7738
root@TinyDistro:~# md5sum new*
4a7f86919d4ac00c6206e11fca462c6f  new.txt
89c60189c3fa7ab5c96ae121ec43bd4a  new1.txt
ashish
  • 361
  • 3
  • 8
  • 1
    Not quite. Only the possibility is high. – sawa Jan 25 '19 at 05:37
  • What is the probability of failing ? – ashish Jan 25 '19 at 06:08
  • Slim, but worse than using some variant of `diff`, over which there is no reason to prefer it. – sawa Jan 25 '19 at 06:24
  • You would have to change MD5 hash to SHA2 in order for this advice to be practical. Anyone's laptop can these days generate collision in MD5 and based on this single collision prefix (2 files of the same size, same prefix and same MD5) to generate infinite number of colliding files (having same prefix, different colliding block, same suffix) – Michal Ambroz Apr 23 '19 at 18:11
2

Radiff2 is a tool designed to compare binary files, similar to how regular diff compares text files.

Try radiff2 which is a part of radare2 disassembler. For instance, with this command:

radiff2 -x file1.bin file2.bin

You get pretty formatted two columns output where differences are highlighted.

funnydman
  • 9,083
  • 4
  • 40
  • 55
2

My favourite ones using xxd hex-dumper from the vim package :

1) using vimdiff (part of vim)

#!/bin/bash
FILE1="$1"
FILE2="$2"
vimdiff <( xxd "$FILE1" ) <( xxd "$FILE2" )

2) using diff

#!/bin/bash
FILE1=$1
FILE2=$2
diff -W 140 -y <( xxd $FILE1 ) <( xxd $FILE2 ) | colordiff | less -R -p '  \|  '
2

wxHexEditor

wxHexEditor is both free and able to Diff large files up to 2^64 bytes (2 ExaByte). Has a GUI. Cross-platform. Lots of features.

To get it for free, choose one of the following options:


Below is the same suggestion as above. But with details if you're interested in those.

Screenshot

enter image description here

Strength

• Hexadecimal (Hex) Editor. Which is helpful for doing reverse Engineering.

• Cross-platform. Linux, Mac OS, Windows

• Easy to use Graphical User Interface (GUI)

• Supports very large files up to 2^64 bytes (2 ExaByte)

• Compare two large files side by side (diff). Optionally list and search all diff.

• Very fast search

• Use small amount of RAM

• Do not create temporary files. So it used a very small amount of storage space.

• Dark or bright theme

• Multilingual 15 languages

• Open source. If you are not familiar with "open source", it means this software has both stronger security & stronger privacy. Because its code is publicly available for review and contributions to GitHub at https://github.com/EUA/wxHexEditor or at SourceForge at https://sourceforge.net/p/wxhexeditor/code/

• Attractive GNU General Public License version 2. This means the software code of this extension is owned and supported by a friendly not-for-profit community. Instead of a for-profit corporation. https://github.com/EUA/wxHexEditor/blob/master/LICENSE

Challenge

• Confusion between the two code repositories. At the time of this writing, August 2021, the GitHub repository seems to be more recent. As it was last updated in 2021 at https://github.com/EUA/wxHexEditor In comparison, the SourceForge repository at https://sourceforge.net/projects/wxhexeditor/ was last update of wxHexEditor was December 31st, 2017.

Show Your Support

• If you enjoy this application, show your support to the authors & contributors with:

___• Donation at https://www.paypal.com/cgi-bin/webscr?item_name=Donation+to+wxHexEditor&cmd=_donations&business=erdem.ua%40gmail.com

___• Support with ticket at https://sourceforge.net/projects/wxhexeditor/support

___• Support with forum at https://sourceforge.net/p/wxhexeditor/discussion/

___• Patch at https://sourceforge.net/p/wxhexeditor/patches/

Using

• wxHexEditor 0.23

• Debian 10 Buster

• GNOME 3.30.2

Francewhoa
  • 31
  • 4
-4

There is a relatively simple way to check if two binary files are the same.

If you use file input/output in a programming language; you can store each bit of both the binary files into their own arrays.

At this point the check is as simple as :

if(file1 != file2){
    //do this
}else{
    /do that
}
  • This solution isn't complete. Also, the pseudo code is not a true implementation of the description given in words. – tpb261 Jun 29 '21 at 13:00