-2

I have words like

MEdIa
media
MEDIA
mEdIa
_media_
_media
media_
ICP_MEDIA

in a file. i am trying to grep the keyword media from the below command

grep -irwE "media|*_media"

But grep can find only

MEdIa
media
MEDIA
mEdIa
_media

Not able to find _media_ , media_ ,ICP_MEDIA

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
  • 1
    This might help: [The Stack Overflow Regular Expressions FAQ](https://stackoverflow.com/a/22944075/3776858) – Cyrus Oct 17 '22 at 08:08

3 Answers3

1

To answer your question: Why is grep not finding all matches

-w, --word-regexp: Select only those lines containing matches that form whole words. The test is that the matching substring must either be at the beginning of the line, or preceded by a non-word constituent character. Similarly, it must be either at the end of the line or followed by a non-word constituent character. Word-constituent characters are letters, digits, and the underscore. This option has no effect if -x is also specified.

So the entry _media_ is not matched to media or *media_ for the following reasons:

  • _media_ is not a whole word match with respect to media as it misses the underscores
  • _media_ is not a whole word match with respect to *media_ as, in regular expressions, an asterisk at the beginning of a regex is just an asterisk and looses its special meaning. And since * is different from _, there is no match.
kvantour
  • 25,269
  • 4
  • 47
  • 72
0

I've tryed this on te example you give:

   cat find | grep 'media'

and the resoult was this:

media
_media_
_media
media_

P.S find is the name of the file i put your examples in.

  • Did `find` contains ICP_MEDIA ? – SampathKumar Esikala Oct 17 '22 at 07:15
  • This will, however, also match things like "lookslikemediabutisnot" because it does not consider word boundaries. From the OPs question it looks like this is what he needs. – Christian.K Oct 17 '22 at 07:24
  • @Christian.K, Yes. My grep command is ``` ret=$(grep -irE "${excluded_files[@]/#/--exclude=}" "\\b${LIST[$A]}\\b|${LIST[$A]}[-_.*>@^:.,~%&\(\)\{}]|\\b${LIST[$A]}\\b" \ "$package/" |grep -v "$filters\|${excluded_dir[@]}" -c) ` Here the LIST contains words in an array. The main intention is. The grep command should find all the words of media. But grep should not find the words like multimedia, intermediate. The grep command should find ``` media _media_ _media media_ MEdIa media MEDIA mEdIa ICP_MEDIA ` – SampathKumar Esikala Oct 17 '22 at 07:47
  • 1
    No, `-v` negates a match. You probably wanted to use `-i`, which ignores case. But the OP already uses that. Also, no need to invoke cat and grep twice to achieve this. – Christian.K Oct 17 '22 at 07:55
  • @Christian.K you got it right man, with `cat find | grep -i 'media'` you can get every sting which contains 'media' – Andreja Milosavljevic Oct 17 '22 at 08:04
  • grep -irE "${excluded_files[@]/#/--exclude=}" "\\b${LIST[$A]}\\b|${LIST[$A]}[-_.*>@^:.,~%&\(\)\{}]|\\b${LIST[$A]}\\b" \ "$package/" |grep -v "$filters\|${excluded_dir[@]}" -c) The above command searches only `MEdia` `media` `MEDIA` `mEdia` `_media_` `media_` But it is not finding ICP_MEDIA – SampathKumar Esikala Oct 17 '22 at 08:20
  • And '[-_.*>@^:.,~%&()\{}]' this particular pattern is restricting to search ICP_MEDIA. Not sure. – SampathKumar Esikala Oct 17 '22 at 08:21
  • @SampathKumarEsikala dude, try command i wrote in last comment... – Andreja Milosavljevic Oct 17 '22 at 08:23
0

I'm pretty sure someone with better regex foo can provide a nicer solution, but this works for me for a selected set of values (see below):

cat file.txt  | grep -iwE "media|.*[\b_]media[\b_]*"
_media_
media
ICP_MEDIA

Values:

_media_
media
ICP_MEDIA
XXX_media_YYY
NOTMEDIA
NOT_MEDIAXX
Christian.K
  • 47,778
  • 10
  • 99
  • 143