-1

I'm trying to do a hex search for a pattern.

I have a file and I search for a pattern on the file with...

xxd -g 2 -c 32 -u file | grep "0045 5804 0001 0000"

This returns the lines that contain that pattern:

FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001

But I want it to return the 4 digits before that pattern which is 08B9 in this case. How could I do it?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Mad-Soft
  • 1
  • 3
  • please update the question to include a few complete lines of output from your `xxd` call ... a couple lines with a match and a couple lines without a match; also update the question to show the expected output – markp-fuso Jul 31 '22 at 22:07

7 Answers7

2

With GNU grep and a Perl-compatible regular expression:

xxd -g 2 -c 32 -u file | grep -Po '....(?= 0045 5804 0001 0000)'

Output:

08B9
Cyrus
  • 84,225
  • 14
  • 89
  • 153
1

Don't use grep, use sed, e.g. using any sed:

$ xxd whataver | sed -n 's/.*\(....\) 0045 5804 0001 0000.*/\1/p'
08B9
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • why not grep? `grep` has **far more regex capabilities than `sed`** because `sed` doesn't support lookarounds while `grep` supports full Python regex with lookarounds – phuclv Aug 01 '22 at 01:41
  • 1
    @phuclv because grep exists to do `g/re/p` (**G**lobally match a **R**egular **E**xpression and **P**rint the result), it's not intended to modify the string that it finds matching a **r**eg**e**xp, it's intended to **p**rint the string that matches a regexp. The type of regexp matching you describe is only in GNU grep, is considered experimental when combined with some other options, and isn't necessary, especially for something this simple that fits into exactly what sed was designed to do and can be done clearly, simply and portably in all versions of all sed variants on every Unix box. – Ed Morton Aug 01 '22 at 14:19
  • lookaround is a standard feature in regex. And it's not experimental, it's been there for so long that it's useful in many other use cases – phuclv Aug 01 '22 at 14:41
  • We're talking about `grep`, not some other tools, and the providers say in [the man page](https://linuxcommand.org/lc3_man_pages/grep1.html) that `This option is experimental when combined with the -z (--null-data) option, and grep -P may warn of unimplemented features.` which is a fairly recent upgrade, it used to just say that `-P` was experimental, period. – Ed Morton Aug 01 '22 at 14:46
0

A not very elegant but intuitively simple approach might be to pipe your grep result into sed and use a simple regex to substitute your search term with an empty string to the end of the line. This leaves the block you want as the last space-separated 'word' of the result, which can be retrieved by piping into awk and printing the last field (steps shown on separate lines for presentation, join them):

xxd -g 2 -c 32 -u file | 
grep "0045 5804 0001 0000" | 
sed 's/0045 5804 0001 0000.*//' | 
awk '{print $NF}'

Dave Pritlove
  • 2,601
  • 3
  • 15
  • 14
  • 3
    You don't need grep when you're using sed and you don't need either of them when you're using awk. `grep "0045 5804 0001 0000" | sed 's/0045 5804 0001 0000.*//'` = `sed -n 's/0045 5804 0001 0000.*//p'` and `grep "0045 5804 0001 0000" | sed 's/0045 5804 0001 0000.*//' | awk '{print $NF}'` or `sed -n 's/0045 5804 0001 0000.*//p' | awk '{print $NF}'` = `awk 'sub(/0045 5804 0001 0000.*/,""){print $NF}'` – Ed Morton Aug 01 '22 at 00:40
0

My xxd prints an 8-digit address, a :, 16x 4-digit hex codes (separated by spaces), and finally the corresponding raw data from the file, eg:

$ xxd -g 2 -c 32 -u  file
         1         2         3         4         5         6         7         8         9         10        11        12        13
1234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
00000000: 4120 302E 3730 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A  A 0.702 asdlfkjasdflkajsdf;lkasj
00000020: 6466 6C6B 6173 6A64 660A 4220 302E 3836 3820 6173 646C 666B 6A61 7364 666C 6B61  dflkasjdf.B 0.868 asdlfkjasdflka
00000040: 322E 3135 3220 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A 6466  2.152 asdlfkjasdflkajsdf;lkasjdf
00000060: 6C6B 6173 6A64 660A                                                              lkasjdf.

NOTE: the 1st two lines (a ruler) added to show column numbering

OP appears to be interested solely in the 4-digit hex codes which means we're interested in the data in columns 11-89 (inclusive).

From here we need to address 4x different scenarios:

  1. match could occur at the very beginning of the xxd output in which case there is no preceeding 4-digit hex code
  2. match occurs at the beginning of the line so we're interested in the 4-digit hex code at the end of the previous line
  3. match occurs in the middle of the line in which case we're interested in the 4-digit hex code just prior to the match
  4. match spans two lines in which case we're interested in the 4-digit hex code just prior to the match on the 1st line

A contrived set of xxd output to demonstrate all 4x scenarios:

$ cat xxd.out 
00000000: 0045 5804 0001 0000 6173 646C 666B 6A61 7364 666C 6B61 6A73 6466 3B6C 6B61 736A  A 0.702 asdlfkjasdflkajsdf;lkasj
#         ^^^^^^^^^^^^^^^^^^^
00000020: 0045 5804 0001 0000 660A 4220 0045 5804 0001 0000 646C 666B 6A61 7364 0045 5804  dflkasjdf.B 0.868 asdlfkjasdflka
#         ^^^^^^^^^^^^^^^^^^^           ^^^^^^^^^^^^^^^^^^^                     ^^^^^^^^^
00000040: 0001 0000 3B6C 6B61 736A 6466 6C6B 6173 6A64 660A 4320 332E 3436 3720 6173 646C  jsdf;lkasjdflkasjdf.C 3.467 asdl
#         ^^^^^^^^^

NOTE: comments added to highlight our matches

One idea using awk:

x='0045 5804 0001 0000'

cat xxd.out |                                             # simulate feeding xxd output to awk
awk -v x="${x}" '

function parse_string() {

    while ( length(string) > (2 * lenx) ) {
          pos= index(string,x)

          if (pos) {
             if   (pos==1) output= "NA (at front of file)"
             else          output= substr(string,pos - 5,4)

             cnt++    
             printf "Match #%s: %s\n", cnt, output
             string= substr(string,pos + lenx)
          }
          else {
             string= substr(string,length(string) - (2 * lenx))
             break
          }
    }
}

BEGIN { lenx = length(x) }

      { string=string substr($0,11,80)                   # strip off address & raw data, append 4-digit hex codes into one long string
        if ( length(string) > (1000 * lenx) )
           parse_string()
      }

END   { parse_string() }
'

NOTE: the parse_string() function and the assorted if (length(string) > ...) tests allow us to limit memory usage to 1000x the length of our search pattern (in this example => 1000 x 19 = 19,000); granted, 'overkill' in the case of small files but it allows us to process large(r) files without having to worry about hogging memory (or in a worst case scenario: an OOM - Out Of Memory - error)

This generates:

Match #1: NA (at front of file)
Match #2: 736A
Match #3: 4220
Match #4: 7364
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
0
nawk 'sub(".* ",_, $!--NF)^_' OFS= FS=' 0045 5804 0001 0000.*$' 
mawk '$!NF = $--NF' FS=' 0045 5804 0001 0000.*$| '
gawk '  $_ = $--NF' FS=' 0045 5804 0001 0000.*$| '
08B9
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11
0

Just make a lookahead and print only the matched string

$ xxd -g 2 -c 32 -u file | grep -Po "[0-9A-F]{4} (?=0045 5804 0001 0000)"
$ xxd -g 2 -c 32 -u file | perl -lne 'print for /([0-9A-F]{4}) (?=0045 5804 0001 0000)/'

But searching the hex representation like that is just silly because:

  1. It won't work when the pattern 0045 5804 0001 0000 is at the beginning of the line (i.e. the output is on the previous line)
  2. It'll be much slower than searching directly in binary

So just search directly with grep then decode like this

grep -Pao "..\x00\x45\x58\x04\x00\x01\x00\x00" file | xxd -p -u -l 2

It matches 2 bytes followed by your byte pattern, then print the first 2 bytes as hex

grep -ao $'..\x12\x34<remaining bytes of hex pattern>' file | xxd -p -u -l 2 also works but not in every case due to the handling of null bytes

If the pattern contains LF \n then you'll also need the -z option

grep -Pzao "..<hex pattern>" file | xxd -p -u -l 2
grep -zao $'..<hex pattern>' file | xxd -p -u -l 2

See also

phuclv
  • 37,963
  • 15
  • 156
  • 475
0

I would harness GNU AWK for this task following way, let file.txt content be

FFFF FFFF FFFF 4556 4E54 0000 0116 0100 08B9 0045 5804 0001 0000 2008 0000 0001

then

awk 'match($0, /[[:xdigit:]]{4} 0045 5804 0001 0000/){print substr($0,RSTART,4)}' file.txt

gives output

08B9

Explanation: I use two String Functions, match to check if current line ($0) and set RSTART variable, then substr to get 4 first characters of match. [[:xdigit:]] denotes base-16 digit, {4} number of repeats.

(tested in gawk 4.2.1)

Daweo
  • 31,313
  • 3
  • 12
  • 25