2

Studying the gzip format, I tried to grep its magic bytes, 1f 8b, in a sample archive. I used the manual from this answer.

$ xxd a.gz
00000000: 1f8b 0800 43dc 605b 0003 4bcb cf4f 4a2c  ....C.`[..K..OJ,
00000010: e202 0047 972c b207 0000 00              ...G.,.....
$ grep -obUaP "\x1f" a.gz
0:
$ grep -obUaP "\x8b" a.gz
# nothing is printed

For some reason grep finds one byte and does not find another. After some investigation we had a blind guess that it fails on bytes with most significant bit set, however we couldn't find any reasonable explanation.

Why does it happen and is there a workaround?

Ivan Smirnov
  • 4,365
  • 19
  • 30

1 Answers1

3

Probably because grep is working with UTF-8; when you search for "\x8b" it's looking for 0xc2 0x8b. You will need to either find some way to disable grep's UTF-8 support, or switch to a tool that strictly interprets the search criteria as binary values.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358