I have a .gz
file on my Unix server. I want to search for two words like abc123
and def456
from that file and if I have these words in the file, I want to print only those (only 2 words not entire line) words in a separate file.

- 730,956
- 141
- 904
- 1,278

- 29
- 4
-
i tried with grep command, but it is printing whole line from the file, but i want only those two words...not entire line which is having these words – Ramana Mahendrakar Jun 30 '15 at 12:46
-
I suggest you edit the question with the command you used that didn't return the results you want. Someone will then be able to correct it for you. – Eric Hauenstein Jun 30 '15 at 12:51
-
You should really show the command(s) that you've tried, explaining why they don't do what you want. Suppose the file wasn't compressed; what would you do to get the information you want from the non-compressed file? How do you see the decompressed contents of a file without actually decompressing the file? How do you combine these two operations? You say 'Unix'; which variant of Unix? Does it have GNU `grep` with the `-o` option? What should happen if the words you're after occur more than once each in the file? Does the order in which the words appear in the output matter? – Jonathan Leffler Jun 30 '15 at 14:48
4 Answers
You can try the following:
zcat f.xml.gz | awk '{\
{ \
if(index($0,str_1)) \
cnt_1=1; \
if(index($0,str_2)) \
cnt_2=1; \
if((cnt_1 + cnt_2) == 2) {\
print str_1,str_2> "f_out.log"; exit;} \
} }' str_1="Keepout" str_2="LatLonList"
where
- "f.xml.gz" is the input file
- str_1 is the first word (your "abc123")
- str_2 is the second word (your "def456")
- "f_out.log" is the separate file in which the two words are written if found in the input file
Hope this helps.

- 730,956
- 141
- 904
- 1,278

- 5,090
- 129
- 31
- 36
-
All those backslashes are unnecessary unless you're careless enough to use a C shell derivative instead of a Bourne shell derivative as your main shell. Sea shells belong on the sea shore, IMO. And in a Bourne-shell derivative, those backslashes would break the script. The opening `{ {` and matching closing `} }` is odd; what's the advantage of the double braces instead of just single braces? Why did you decide to use 'Keepout' and 'LatLonList' instead of `abc123` and `def456`? – Jonathan Leffler Jun 30 '15 at 14:55
-
the above snippet is not working... if i give str_1 world then also it is printing world, even world word in not present in my file – Ramana Mahendrakar Jun 30 '15 at 16:34
Your question has an answer in this SO post
You can run this command to achieve what you want
gzcat <filename.zip> | grep -oh "<Search pattern>" *
for ex
gzcat <filename.zip> | grep -oh "abc123" *
I do not have zgrep installed but you can also try this
zgrep -oh "<Search pattern>" *` filename.zip

- 1
- 1

- 3,661
- 2
- 22
- 36
ripgrep
Use ripgrep
, it's written in Rust therefore very efficient, especially for large files. For example:
rg -zo "abc123|def456" *.gz
-z
/--search-zip
Search in compressed files (such asgz
,bz2
,xz
, andlzma
).
-o
/--only-matching
Print only the matched parts of a matching line.

- 155,785
- 88
- 678
- 743
grep
/zgrep
/zegrep
Use zgrep
or zegrep
to look for pattern in compressed files using their uncompressed contents (both GNU/Linux and BSD/Unix).
On Unix, you can also use grep
(which is BSD version) with -Z
, including -z
on macOS.
Few examples:
zgrep -E "abc123|def456" *.gz
zegrep "abc123|def456" **/*.gz
grep -z -e "abc123" -e "def456" *.gz # BSD/Unix only.
Note: When you've globbing option enabled, **
checks the files recursively, otherwise use -r
.
-R
/-r
/--recursive
Recursively search subdirectories listed.
-E
/--extended-regexp
Interpret pattern as an extended regular expression (likeegrep
).
-Z
(BSD),-z
/--decompress
(BSD/macOS) Force grep to behave aszgrep
.