Can grep show only words that match search pattern?

Question

Is there a way to make grep output "words" from files that match the search expression?

If I want to find all the instances of, say, "th" in a number of files, I can do:

grep "th" *

but the output will be something like (bold is by me);

some-text-file : the cat sat on the mat  
some-other-text-file : the quick brown fox  
yet-another-text-file : i hope this explains it thoroughly

What I want it to output, using the same search, is:

the
the
the
this
thoroughly

Is this possible using grep? Or using another combination of tools?

Is there a way one can print those matched words without changing the lines. Rather the matched string should remain in the same line? — Linguist, Jun 01 '17 at 19:50
tac file.log | grep "In msg::" | grep -oh "templateId=.*, temp" — Kasthuri Shravankumar, Jun 21 '22 at 10:31

score 1383 · Answer 1 · edited Apr 28 '21 at 01:43

1383

Try grep -o:

grep -oh "\w*th\w*" *

Edit: matching from Phil's comment.

From the docs:

-h, --no-filename
    Suppress the prefixing of file names on output. This is the default
    when there is only  one  file  (or only standard input) to search.
-o, --only-matching
    Print  only  the matched (non-empty) parts of a matching line,
    with each such part on a separate output line.

edited Apr 28 '21 at 01:43

Sergey Vyacheslavovich Brunov

17,291
7
48
81

answered Oct 10 '09 at 01:01

Dan Midwood

18,694
7
33
32

14

@user181548, The grep -o option works only for GNU grep. So if you are not using GNU grep, it might not work for you. – ksinkar Aug 25 '14 at 11:10
5

@A-B-B It depends if you want to display the name of the matched file or not. I'm not sure under what conditions it does and doesn't display, but I do know that when I used grep across a number of directories it did display the full file path for all matched files, whereas with -h it just displayed the matched words without any specification about which file it is. So, to match the original question, I think it is necessary in certain circumstances. – LokMac Nov 15 '17 at 01:41
7

I needed an explanation for what `"\w*th\w*" *` means, so I figured I'd post. `\w` is [_[:alnum:]], so this matches basically any "word" that contains 'th' (since `\w` doesn't include space). The * after the quoted section is a glob for which files (i.e., matching all files in this directory) – jeremysprofile Jul 06 '18 at 00:28
3

`\w` is not generally portable to `grep -E`; for proper portability, use the POSIX character class name `[[:alnum:]]` instead (or `[_[:alnum:]]` if you really want the underscore, too; or try `grep -P` if your platform has that). – tripleee Nov 07 '18 at 10:44
@A-B-B Given the desired output shown by the OP the `-h` is entirely necessary I would say.. ? – El Ronnoco Jan 09 '19 at 10:42
Much better than `abk`. – neverMind9 Apr 09 '19 at 11:21
-o only gives me exactly what I searched for, I need the whole line. Wtf? It's like trying to work with a genie that takes everything too literally. – Nathan McKaskle Nov 03 '21 at 17:15
tac file.log | grep "In msg::" | grep -oh "templateId=.*, temp" – Kasthuri Shravankumar Jun 21 '22 at 10:28

score 109 · Answer 2 · edited Jul 27 '22 at 13:48

109

Cross distribution safe answer (including windows minGW?)

grep -h "[[:alpha:]]*th[[:alpha:]]*" 'filename' | tr ' ' '\n' | grep -h "[[:alpha:]]*th[[:alpha:]]*"

If you're using older versions of grep (like 2.4.2) which do not include the -o option, then use the above. Else use the simpler to maintain version below.

Linux cross distribution safe answer

grep -oh "[[:alpha:]]*th[[:alpha:]]*" 'filename'

To summarize: -oh outputs the regular expression matches to the file content (and not its filename), just like how you would expect a regular expression to work in vim/etc... What word or regular expression you would be searching for then, is up to you! As long as you remain with POSIX and not perl syntax (refer below)

More from the manual for grep

-o      Print each match, but only the match, not the entire line.
-h      Never print filename headers (i.e. filenames) with output lines.
-w      The expression is searched for as a word (as if surrounded by
         `[[:<:]]' and `[[:>:]]';

The reason why the original answer does not work for everyone

The usage of \w varies from platform to platform, as it's an extended "perl" syntax. As such, those grep installations that are limited to work with POSIX character classes use [[:alpha:]] and not its perl equivalent of \w. See the Wikipedia page on regular expression for more

Ultimately, the POSIX answer above will be a lot more reliable regardless of platform (being the original) for grep

As for support of grep without -o option, the first grep outputs the relevant lines, the tr splits the spaces to new lines, the final grep filters only for the respective lines.

(PS: I know most platforms by now would have been patched for \w.... but there are always those that lag behind)

Credit for the "-o" workaround from @AdamRosenfield answer

edited Jul 27 '22 at 13:48

crenshaw-dev

7,504
3
45
81

answered Apr 14 '13 at 08:17

PicoCreator

9,886
7
43
64

1

What about -o only working in GNU grep (as ksinkar mentioned in a comment on the accepted answer)? – Brilliand Jun 19 '15 at 17:26
@Brilliand hmm, im having trouble finding a linux implementation that does not support '-o', i can look for a work around if i know which platform to check against. – PicoCreator Jun 20 '15 at 14:37
@pico The `-o` option is not present in the windows grep that installs with the git package (minGW?): `"c:\Program Files (x86)\Git\bin\grep" --version grep (GNU grep) 2.4.2` – Bruce Peterson Jul 01 '15 at 18:53
@BrucePeterson i have added in AdamRosenfield workaround answer for -o : Help me check if the windows git includes tr / sed and its version. So i can check if this workaround works – PicoCreator Jul 04 '15 at 06:02
@pico: for GIT: GNU sed version 4.2.1, tr (GNU textutils) 2.0 – Bruce Peterson Jul 06 '15 at 20:33
`-o` is not valid in linux git either – Collin Anderson Mar 07 '17 at 18:32
@CollinAnderson Your comment doesn't really make sense. GNU `grep` and thus pretty much every Linux box has `grep -o`; there is no `-o` option in Git itself, but many Windows victims install a `git`package which includes many Unix utilities, including a `grep` implementation. – tripleee Nov 07 '18 at 10:48
@BrucePeterson If you genuinely have GNU `grep` 2.4.2 then it's frightfully old; the `-o` option was introduced in 2.5.1 [sometime in 2001](http://git.savannah.gnu.org/cgit/grep.git/tree/ChangeLog-2009#n1228) – tripleee Nov 07 '18 at 10:55
@tripleee I meant `git grep` – Collin Anderson Nov 12 '18 at 19:24

Abhinandan prasad · Answer 3 · 2017-07-12T03:06:18.460

68

It's more simple than you think. Try this:

egrep -wo 'th.[a-z]*' filename.txt #### (Case Sensitive)

egrep -iwo 'th.[a-z]*' filename.txt  ### (Case Insensitive)

Where,

 egrep: Grep will work with extended regular expression.
 w    : Matches only word/words instead of substring.
 o    : Display only matched pattern instead of whole line.
 i    : If u want to ignore case sensitivity.

edited Jul 12 '17 at 03:06

answered Mar 28 '17 at 09:25

Abhinandan prasad

1,009
7
13

5

This doesn't seem to add anything over the existing answers from 4+ years before. – tripleee Nov 07 '18 at 10:46
7

@tripleee I found my approach is better and simple so I posted this. – Abhinandan prasad Feb 06 '19 at 14:45

Adam Rosenfield · Answer 4 · 2009-10-10T18:40:46.367

51

You could translate spaces to newlines and then grep, e.g.:

cat * | tr ' ' '\n' | grep th

edited Oct 10 '09 at 18:40

answered Oct 10 '09 at 01:43

Adam Rosenfield

390,455
97
512
589

23

no need cat. tr ' ' '\n' < file | grep th. Slow for big files. – ghostdog74 Oct 10 '09 at 02:00
This didn't work. The output still contained the filename and the entire line from the file that contained the match. Anyway, one of the other solutions offered worked. Thanks for the input though. – Neil Baldwin Oct 10 '09 at 08:59
@ghostdog74: good point, although if you have more than file, you'll need to use cat. @Neil Baldwin: are you sure you typed it in right? When there's only one input file (stdin in this case), grep doesn't print the filename. – Adam Rosenfield Oct 10 '09 at 14:58
@Adam - yes, sorry Adam, it does work with one file but not multiple. – Neil Baldwin Oct 10 '09 at 15:52
@Neil Baldwin: just list all of your files as parameters to cat, it works fine with multiple files – Adam Rosenfield Oct 10 '09 at 18:41
@Adam - so where you've got 'file' in the example, I would just put 'file1 file2 file3' etc. ? – Neil Baldwin Oct 10 '09 at 20:27
4

@ghostdog74 if the slow part is because of `tr`, he could do `grep` first, so `tr` would be applied only to matching lines: `grep th filename | tr ' ' '\n' | grep th` – Carcamano Dec 22 '15 at 20:21

score 42 · Answer 5 · edited Apr 29 '15 at 12:43

42

Just awk, no need combination of tools.

# awk '{for(i=1;i<=NF;i++){if($i~/^th/){print $i}}}' file
the
the
the
this
thoroughly

edited Apr 29 '15 at 12:43

fedorqui

275,237
103
548
598

answered Oct 10 '09 at 00:54

ghostdog74

327,991
56
259
343

Doesn't answer the question posed. – AdamC Dec 13 '22 at 21:38

score 12 · Answer 6 · edited Nov 29 '12 at 09:30

12

grep command for only matching and perl

grep -o -P 'th.*? ' filename

edited Nov 29 '12 at 09:30

gnat

6,213
108
53
73

answered Nov 29 '12 at 09:11

Raghu

129
1
2

4

What about display of only the matched group? – Bishwas Mishra Jan 04 '18 at 06:01
This doesn't work; it will only ever find `th` because you requested the shortest possible repetition of the wildcard. – tripleee Nov 07 '18 at 10:59
1

@tripleee - it won't have that problem, because there's a space included at the end of the regex. However, it will miss words that don't have spaces after them, e.g. at the ends of lines. – Ken Williams Jan 08 '19 at 22:54

score 10 · Answer 7 · answered Jan 11 '11 at 21:25

I was unsatisfied with awk's hard to remember syntax but I liked the idea of using one utility to do this.

It seems like ack (or ack-grep if you use Ubuntu) can do this easily:

# ack-grep -ho "\bth.*?\b" *

the
the
the
this
thoroughly

If you omit the -h flag you get:

# ack-grep -o "\bth.*?\b" *

some-other-text-file
1:the

some-text-file
1:the
the

yet-another-text-file
1:this
thoroughly

As a bonus, you can use the --output flag to do this for more complex searches with just about the easiest syntax I've found:

# echo "bug: 1, id: 5, time: 12/27/2010" > test-file
# ack-grep -ho "bug: (\d*), id: (\d*), time: (.*)" --output '$1, $2, $3' test-file

1, 5, 12/27/2010

score 9 · Answer 8 · edited Apr 29 '15 at 12:43

9

cat *-text-file | grep -Eio "th[a-z]+"

edited Apr 29 '15 at 12:43

fedorqui

275,237
103
548
598

answered Sep 14 '10 at 15:30

Mumbling Mac

91
1
1

3

or just grep -Eio "th[a-z]+" filename – Shayan Oct 23 '15 at 01:52
4

Maybe see also [Useless use of `cat`?](/q/11710552) – tripleee Nov 07 '18 at 10:57

score 5 · Answer 9 · edited Jan 03 '11 at 22:24

5

You can also try pcregrep. There is also a -w option in grep, but in some cases it doesn't work as expected.

From Wikipedia:

cat fruitlist.txt
apple
apples
pineapple
apple-
apple-fruit
fruit-apple

grep -w apple fruitlist.txt
apple
apple-
apple-fruit
fruit-apple

edited Jan 03 '11 at 22:24

palswim

11,856
6
53
77

answered Nov 14 '09 at 12:15

Maciek Sawicki

6,717
9
34
48

score 4 · Answer 10 · answered Feb 14 '13 at 16:39

4

I had a similar problem, looking for grep/pattern regex and the "matched pattern found" as output.

At the end I used egrep (same regex on grep -e or -G didn't give me the same result of egrep) with the option -o

so, I think that could be something similar to (I'm NOT a regex Master) :

egrep -o "the*|this{1}|thoroughly{1}" filename

answered Feb 14 '13 at 16:39

keebOo

83
2
10

1

The useless `{1}` quantifiers should be dropped. Or if you want to be consistent, `t{1}h{1}e{1}` etc. – tripleee Mar 21 '16 at 16:51
can it print with the same line? – ife Dec 27 '16 at 12:44

score 4 · Answer 11 · answered Jan 16 '14 at 15:46

4

To search all the words with start with "icon-" the following command works perfect. I am using Ack here which is similar to grep but with better options and nice formatting.

ack -oh --type=html "\w*icon-\w*" | sort | uniq

answered Jan 16 '14 at 15:46

Sandeep

28,307
3
32
24

score -1 · Answer 12 · 2009-10-10T01:31:31.877

-1

You could pipe your grep output into Perl like this:

grep "th" * | perl -n -e'while(/(\w*th\w*)/g) {print "$1\n"}'

edited Oct 10 '09 at 01:31

answered Oct 10 '09 at 01:06

10

that won't give the correct result. also, if using Perl, no need to use grep. do everything in Perl. – ghostdog74 Oct 10 '09 at 01:15
Thanks for pointing out the error, ghostdog74. I have changed it to print all the words on the line, not just the first. – Oct 10 '09 at 01:26
like i said, grep is not necessary. perl -n -e'while(/(\s+th\w*)/g) {print "$1\n"}' file – ghostdog74 Oct 10 '09 at 01:30
I don't think it's important here to avoid using grep. – Oct 10 '09 at 01:33
8

up to you. i am just illustrating a point. If its not necessary, don't do it. that extra "|" will cost you one process more. – ghostdog74 Oct 10 '09 at 02:03
1

In Perl 5.10 or later: perl -nE '@a = /(regexp)/ig; say join "\n", @a' – Professor Photon Oct 18 '16 at 16:26

score -1 · Answer 13 · answered Aug 30 '22 at 23:46

-1

grep --color -o -E "Begin.{0,}?End" file.txt

? - Match as few as possible until the End

Tested on macos terminal

answered Aug 30 '22 at 23:46

Mickey Tin

3,408
10
42
71

score -2 · Answer 14 · answered May 29 '12 at 06:32

-2

$ grep -w

Excerpt from grep man page:

-w: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character.

answered May 29 '12 at 06:32

pl1nk

1,952
1
13
11

2

That will still print the entire line containing the match. It constrains the actual match so that `the` no longer matches e.g. "these" or "bathe". – tripleee May 09 '14 at 04:20

score -2 · Answer 15 · answered Nov 07 '18 at 12:38

-2

`ripgrep`

Here are the example using ripgrep:

rg -o "(\w+)?th(\w+)?"

It'll match all words matching th.

answered Nov 07 '18 at 12:38

kenorb

155,785
88
678
743

Can grep show only words that match search pattern?

15 Answers15

`ripgrep`

Linked

Related