1

I'm trying to do a PuTTY search for a specific text "Omega". But I want to exclude two specific URLs that also contain the text "Omega".

I have tried:

grep -ril "Omega" --exclude='<p> | <a href="www.omega.com"> Omega</a> |</p>' --exclude='<li><a href ="www.omega.com"> Omega</a></li>'

Also tried:

grep -ril "Omega" --exclude={<p> | <a href=" www.omega.com"> Omega</a> |</p>,<li><a href ="www.omega.com" target="_blank">Omega</a></li>}

Note the 2 pipes in one of the excludes is a divider for my navigation menu. I'm trying write the results to a log file. I'm not generating the results that I need.

Martin Prikryl
  • 188,800
  • 56
  • 490
  • 992
nicban
  • 49
  • 2
  • 8

3 Answers3

1

The simplest solution:

grep <your_search> | grep -v <url1> | grep -v <url2>
sankalpn
  • 76
  • 6
  • Thanks for the quick response. I'm new to stack overflow and my post didn't set up right because of the href's So I have one url that is has pipes ' | Omega |' how can I include the pipes without it being an "or" . Would that matter, since it is inside of quotations? – nicban Jun 01 '15 at 21:25
  • There is even simpler way by using: `grep -ve -e `. – kenorb Jun 01 '15 at 23:09
  • @nicban you can always use grep -vF. -F tells it to treat string in quotes as a fixed string. See: http://stackoverflow.com/questions/12387685/grep-for-special-characters-in-unix – sankalpn Jun 02 '15 at 00:31
1

I would use awk for this:

awk -v pat1='<a href="www.omega.com"> Omega</a> |</p>' 
    -v pat2='<li><a href ="www.omega.com"> Omega</a></li>' 
    '/Omega/ && $0 !~ pat1 && $0 !~ pat2' file

With this, we are matching those lines that contain Omega but do not contain the patterns you indicate in the question.

Note that you grep --exclude is not the way to go, since exclude affects files, not patterns.

Test

$ cat a
Omega
<p> | <a href="www.omega.com"> Omega</a> |</p>
<li><a href ="www.omega.com"> Omega</a></li>'
my Omega
$ awk -v pat1='<a href="www.omega.com"> Omega</a> |</p>' -v pat2='<li><a href ="www.omega.com"> Omega</a></li>' '/Omega/ && $0 !~ pat1 && $0 !~ pat2' a
Omega
my Omega
Community
  • 1
  • 1
fedorqui
  • 275,237
  • 103
  • 548
  • 598
0

Parsing html code without a dedicated parser is painful. If you cannot clean the input for grep, use a dedicated HTML Parser

If you could clean the code, then it should be as simple as:

# nice input ahead
> cat omega_sites.txt 
www.exclude1_omega.com
www.exclude1_omega.com
www.my_precious_omega.com
www.all_but_omega.org
www.just_alpha.net

# filter exclude1 and exclude2 
# and redirect using tee to a log file 
> grep -i omega omega_sites.txt | grep -v -i "exclude1\|exclude2" | tee omega_sites_filtered.txt
www.my_precious_omega.com
www.all_but_omega.org
> 
7y7
  • 1
  • 1