3

In postgres I'm using regexp_replace to clean up some yahoo email addresses.

SELECT regexp_replace(domain,'yahoo\.co[^\.].*','yahoo.com')  FROM table  

Unfortunately, this expression matches this type yahoo.com.tw that I'd like to NOT MATCH. I'd like the regex to:

NOT MATCH:

yahoo.es
yahoo.co.jp
yahoo.com.tw  

MATCH:

yahoo.com,
yahoo.com.
yahoo.com'

I've been at this for several hours and have looked at several links in stack like this: Regular expression to match a line that doesn't contain a word?

They have helped me write a negative lookahead but can't seem to combine it with the yahoo portion

NEGATIVE LOOKAHEAD

     ^(?!.*(\.com?\.|\.[a-z]{2})).*$    

Here is the regex101 data and formula.
Any suggestions would be much appreciated. Thank you.

moonshot
  • 649
  • 1
  • 5
  • 13

1 Answers1

2

This pattern matches everything directly below your yahoo REPLACE section and non of the good cases above:

^yahoo\.com?(?:(?!\.[a-z]{2}).)*(?:.com?)?$

Demo

The other cases below, like yahool.com, can be added as optional matches, like this:

^yahoo\.com?(?:(?!\.[a-z]{2}).)*(?:.com?)?$|^yahoo$|^yahool\.com|^yahooo\.com|^yahoo\.net|^yahoocom$

etc. However, some can be grouped easily together.

wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Very helpful, thanks. What does the 2nd Non capturing group do ? ie this one : `(?:.com?)?$` – moonshot Nov 23 '17 at 17:59
  • 1
    Legit question. It handles the special case: yahoo.com21yahoo.com that has another .com after yahoo.com – wp78de Nov 23 '17 at 19:45