2

I have a 2GB file in raw format. I want to search for all appearance of a specific HEX value "355A3C2F74696D653E" AND collect the following 28 characters.

Example: 355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135

In this case I want the output: "323031312D30342D32365431343A34373A30322D31343A34373A3135" or better: 2011-04-26T14:47:02-14:47:15

I have tried with

xxd -u InputFile | grep '355A3C2F74696D653E' | cut -c 1-28 > OutputFile.txt

and

xxd -u -ps -c 4000000 InputFile | grep '355A3C2F74696D653E' | cut -b 1-28 > OutputFile.txt

But I can't get it working.

Can anybody give me a hint?

ROMANIA_engineer
  • 54,432
  • 29
  • 203
  • 199
hdk
  • 21
  • 1
  • 4

3 Answers3

1

As you are using xxd it seems to me that you want to search the file as if it were binary data. I'd recommend using a more powerful programming language for this; the Unix shell tools assume there are line endings and that the text is mostly 7-bit ASCII. Consider using Python:

#!/usr/bin/python
import mmap
fd = open("file_to_search", "rb")
needle = "\x35\x5A\x3C\x2F\x74\x69\x6D\x65\x3E"
haystack = mmap.mmap(fd.fileno(), length = 0, access = mmap.ACCESS_READ)
i = haystack.find(needle)
while i >= 0:
    i += len(needle)
    print (haystack[i : i + 28])
    i = haystack.find(needle, i)
Jack Whitham
  • 609
  • 4
  • 9
  • Hi, I'm not that experienced in python, has only tried a little script, but assuming that I just need to copy/paste the command lines into an empty notepad document, save it as e.g. Needle and run it by typing bash Needle in terminal ... ?? When I do that, it says: Needle: line 2: import: command not found Needle: line 3: syntax error near unexpected token `(' Needle: line 3: `fd = open (" InputFileName "," rb ") ' I have placed the script in the same folder as the InputFile. What am I doing wrong? Best Regards hdk – hdk May 01 '15 at 08:00
  • It's not being run by the python interpreter. Save the file as `script.py` and then run `python script.py` in Bash. – Jack Whitham May 01 '15 at 08:41
  • Hi again, now it works:-) Thanks a lot. I have also tried to write the output to a file with: writeFile = open('Time.txt', 'w') and in the while-loop: writeFile.write(haystack[i : i + 28]). It works okay, but I want one line per iteration, and the output is one long line of text. I have tried with writeline and writelines, but it doesn't change the output. – hdk May 01 '15 at 09:34
  • Use `writeFile.write("\n")` to insert a new line. – Jack Whitham May 01 '15 at 09:36
  • 1
    Hi Jack, thank you, now it works perfect. I am new to stackoverflow, is there anyway I can mark your answer as helpful or give you thumbs up...?? Best Regards hdk. – hdk May 01 '15 at 10:14
  • Hi, Jack helped me last year with a Python script. It was very usefull. Now I want to use it again, but the output must be in Hex values this time, and it would be very nice to have the offset were the 'needle' is found in the haystack. Is that possible? – hdk Dec 13 '16 at 16:22
0

If your grep supports -P parameter then you could simply use the below command.

$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{28}'
323031312D30342D32365431343A

For 56 chars,

$ echo '355A3C2F74696D653E323031312D30342D32365431343A34373A30322D31343A34373A3135' | grep -oP '355A3C2F74696D653E\K.{56}'
323031312D30342D32365431343A34373A30322D31343A34373A3135
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • Hi, the result I got with the command I posted is very similar to the one you suggest, but it gives a result I can't use, a mix of HEX and plain text: 5432 303A 3237 2011-04-2 5432 303A e>2011-04-26T2 5432 303A 3239 2011-04-2 5432 303A e>2011-04-26T2 5432 303A 3333 2011-04-2 5432 303A e>2011-04-26T2 5432 303A 3530 2011-04-2 Here is 7 lines, (i don't know how to do lineshift...:-/) – hdk Apr 30 '15 at 17:37
  • 1
    Accept an answer and ask this as a new question. – Avinash Raj May 01 '15 at 16:06
0

Why convert to hex first? See if this awk script works for you. It looks for the string you want to match on, then prints the next 28 characters. Special characters are escaped with a backslash in the pattern.

Adapted from this post: Grep characters before and after match?

I added some blank lines for readability.

VirtualBox:~$ cat data.dat

Thisis a test of somerandom characters before thestringI want5Z</time>2011-04-26T14:47:02-14:47:15plus somemoredata

VirtualBox:~$ cat test.sh

awk '/5Z\<\/time\>/ {
  match($0, /5Z\<\/time\>/); print substr($0, RSTART + 9, 28);
}' data.dat

VirtualBox:~$ ./test.sh

2011-04-26T14:47:02-14:47:15

VirtualBox:~$ 

EDIT: I just realized something. The regular expression will need to be tweaked to be non-greedy, etc and between that and awk need to be tweaked to handle multiple occurrences as you need them. Perhaps some of the folks more up on awk can chime in with improvements as I am real rusty. An approach to consider anyway.

Gary_W
  • 9,933
  • 1
  • 22
  • 40