5

For regular files I can use the comm command to find common lines.

For example we have two files

$ cat f1
line1
line2
line3
line4
line5

$ cat f2
line1
line20
line30
line4
line5

Its compared like:

$ comm -12 f1 f2
line1
line4
line5

How to find the offset of matching lines and also how to do comparison for two binary files and print matching line offset?

I've been using things like diff, cmp, comm for past 1hr, unable to figure this out.

EDIT 1: Not exact solution but found vbindiff it helps a bit.

webminal.org
  • 44,948
  • 37
  • 94
  • 125
  • 4
    If it's a binary file, it doesn't have lines. – ganbustein Jan 07 '15 at 10:04
  • okay, right, but how to figure out offset of first common 80chars within these files. – webminal.org Jan 07 '15 at 10:25
  • 2
    I understand your question but not your problem. What do you really want to achieve? Which problem you want to solve? – Klaus Jan 07 '15 at 10:33
  • I have two files which are binary dump (unknown format). Say file1 has content "abcde" and file2 has "defgh" . I need to figure out a way to merge these two files by removing the common pattern. in this case its "de". output will be "abcdefgh" – webminal.org Jan 07 '15 at 10:41

1 Answers1

9

You are probably looking for cmp:

cmp - compare two files byte by byte

$ cmp f1 f2
f1 f2 differ: byte 12, line 2

$ cmp -b f1 f2
f1 f2 differ: byte 12, line 2 is  12 ^J  60 0

$ cmp -bl f1 f2
12  12 ^J    60 0
13 154 l     12 ^J
14 151 i    154 l
15 156 n    151 i
16 145 e    156 n
17  63 3    145 e
18  12 ^J    63 3
19 154 l     60 0
20 151 i     12 ^J
21 156 n    154 l
22 145 e    151 i
23  64 4    156 n
24  12 ^J   145 e
25 154 l     64 4
26 151 i     12 ^J
27 156 n    154 l
28 145 e    151 i
29  65 5    156 n
30  12 ^J   145 e
cmp: EOF on f1

From man cmp:

-b, --print-bytes

print differing bytes

-l, --verbose

output byte numbers and differing byte values

fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • thanks,Does cmp has option which will print matching line/byte offset? instead of telling about differ byte – webminal.org Jan 07 '15 at 10:26
  • 1
    As they are treated as binary, there is no such concept of lines. I am not very sure about how you expect the output to look like, could you clarify? – fedorqui Jan 07 '15 at 10:30
  • I would like to figure out which offset they have common pattern. For example something like : cmp f1 f2 f1 f2 same at byte byte 120, line 12 . Thanks for the help, for my case, I found "vbindiff". – webminal.org Jan 07 '15 at 10:37
  • I insist: there is no such thing like line in binary files. As you can see, `cmp` shows a message like "f1 f2 differ: byte 12, line 2 is 12 ^J 60 0" – fedorqui Jan 07 '15 at 11:07