My xxd
prints an 8-digit address, a :
, 16x 4-digit hex codes (separated by spaces), and finally the corresponding raw data from the file, eg:
$ xxd -g 2 -c 32 -u file
1 2 3 4 5 6 7 8 9 10 11 12 13
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
00000000: 4120 302E 3730 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
00000020: 6466 6C6B 6173 6A64 660A 4220 302E 3836 3820 6173 646C 666B 6A61 7364 666C 6B61 dflkasjdf.B 0.868 asdlfkjasdflka
00000040: 322E 3135 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A 6466 2.152 asdlfkjasdflkajsdf;lkasjdf
00000060: 6C6B 6173 6A64 660A lkasjdf.
NOTE: the 1st two lines (a ruler) added to show column numbering
OP appears to be interested solely in the 4-digit hex codes which means we're interested in the data in columns 11-89 (inclusive).
From here we need to address 4x different scenarios:
- match could occur at the very beginning of the
xxd
output in which
case there is no preceeding 4-digit hex code
- match occurs at the beginning of the line so we're interested in the
4-digit hex code at the end of the previous line
- match occurs in the middle of the line in which case we're
interested in the 4-digit hex code just prior to the match
- match spans two lines in which case we're interested in the 4-digit
hex code just prior to the match on the 1st line
A contrived set of xxd
output to demonstrate all 4x scenarios:
$ cat xxd.out
00000000: 0045 5804 0001 0000 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A A 0.702 asdlfkjasdflkajsdf;lkasj
# ^^^^^^^^^^^^^^^^^^^
00000020: 0045 5804 0001 0000 660A 4220 0045 5804 0001 0000 646C 666B 6A61 7364 0045 5804 dflkasjdf.B 0.868 asdlfkjasdflka
# ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^
00000040: 0001 0000 3B6C 6B61 736A 6466 6C6B 6173 6A64 660A 4320 332E 3436 3720 6173 646C jsdf;lkasjdflkasjdf.C 3.467 asdl
# ^^^^^^^^^
NOTE: comments added to highlight our matches
One idea using awk
:
x='0045 5804 0001 0000'
cat xxd.out | # simulate feeding xxd output to awk
awk -v x="${x}" '
function parse_string() {
while ( length(string) > (2 * lenx) ) {
pos= index(string,x)
if (pos) {
if (pos==1) output= "NA (at front of file)"
else output= substr(string,pos - 5,4)
cnt++
printf "Match #%s: %s\n", cnt, output
string= substr(string,pos + lenx)
}
else {
string= substr(string,length(string) - (2 * lenx))
break
}
}
}
BEGIN { lenx = length(x) }
{ string=string substr($0,11,80) # strip off address & raw data, append 4-digit hex codes into one long string
if ( length(string) > (1000 * lenx) )
parse_string()
}
END { parse_string() }
'
NOTE: the parse_string()
function and the assorted if (length(string) > ...)
tests allow us to limit memory usage to 1000x the length of our search pattern (in this example => 1000 x 19 = 19,000
); granted, 'overkill' in the case of small files but it allows us to process large(r) files without having to worry about hogging memory (or in a worst case scenario: an OOM - Out Of Memory - error)
This generates:
Match #1: NA (at front of file)
Match #2: 736A
Match #3: 4220
Match #4: 7364