2

I want the following examples to return match

  • I like foobar.com
  • I like google.com and foobar.com
  • I like foobar.com and google.com
  • I like foobargoogle.com and googlefoobar.com
  • I like yahoo.com and foobar.com
  • I like foobar.com and yahoo.com
  • I like foobaryahoo.com and yahoofoobar.com

I don't want the following examples to return match

  • I like yahoo.com
  • I like foobaryahoo.com
  • I like google.com
  • I like foobargoogle.com
  • I like google.com and yahoo.com
  • I like foobargoogle.com and foobaryahoo.com

Note - It's not an equal match but a contain match

I tried the following Regex Pattern:

(?!(^.*((google)|(yahoo))\.com.*$))(^.*\w+\.com.*$)

but as soon as either "google.com" or "yahoo.com" occured, it terminated with no match even if "foobar.com" occured before it.

eg. I like foobar.com but not google.com

Basically, I want it to ignore "google.com" and "yahoo.com" anywhere in the string and detect any other type: "\w+.com".

Note:

  • google.com and yahoo.com are just examples. So, it should ignore match set of any string length of alphanumeric characters
  • ignore case and whitespace
nhahtdh
  • 55,989
  • 15
  • 126
  • 162
Souvik Das
  • 33
  • 7

2 Answers2

1

You could do this through the PCRE verb (*SKIP)(*F) .

(?:google|yahoo)(?:\.com)?(*SKIP)(*F)|(?:(?!google|yahoo)\w)+\.com
^                                    ^                           ^
|------Part you don't want-----------|------------Part you want--|

DEMO

Explanation:

  • (?:google|yahoo)(?:\.com)? matches abruptly google or yahoo strings with a following optional .com. See the demo here.

  • (?:google|yahoo)(?:\.com)?(*SKIP)(*F)| And the following PCRE verbs (*SKIP)(*F) would cause the previous match to fail. And the following | OR will match all the boundaries which isn't followed by google or yahoo strings. See the demo here

  • (?:(?!google|yahoo)\w)+\.com Now the regex engine would consider the matched boundaries as starting point, it abruptly matches one or more word characters but not of google or yahoo.

  • \.com Helps to match only the strings which ends with .com
Community
  • 1
  • 1
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
1

No need to be so complicated.
This finds exactly what you highlighted in the want lines.

 # (?:\w(?<!google)(?<!yahoo))+\.com

 (?:
      \w                # Match words but not google or yahoo behind
      (?<! google )
      (?<! yahoo )
 )+
 \. com