1

Before the release, we want to scan the CMS articles to know if we have any URLS that contain "dev", "qa" or "ua" in it.

I found this regex from What is a good regular expression to match a URL?

(https?://(?:www.|(?!www))[^\s.]+.[^\s]{2,}|www.[^\s]+.[^\s]{2,})

I want to update this so that it just matches the URLs which have "dev", "qa" or "ua" in it.

i.e.

http://regexrdev.com/foo.html?q=bar
https://dev.mediatemple.net

http://regexrqa.com/foo.html?q=bar
http://qa.mediatemple.net

and 

http://regexrua.com/foo.html?q=bar
https://ua.mediatemple.net

should match and not

www.demo.com    
http://foo.co.uk/
http://regexr.com/foo.html?q=bar
https://mediatemple.net

Would be very helpful if you can update the expression here

http://regexr.com/3dd09

and then share

Community
  • 1
  • 1
Robin
  • 6,879
  • 7
  • 37
  • 35

1 Answers1

2

It seems you want to only match URLs that contain those 3 strings.

You can use

(?=\S*(?:ua|dev|qa))(?:https?:\/\/(?:www\.|(?!www))[^\s.]+\.\S{2,}|www\.\S+\.\S{2,})

The positive lookahead (?=\S*(?:ua|dev|qa)) will force the matched string to have either ua, dev or qa in it.

See the regex demo

I also replace [^\s] with \S as these are equivalent.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563