0

I am trying to filter a list of links to match only those which are only a second level domain

Success:

https://www.thingisawesome.anything
https://thingisawesome.anything
http://www.thingisawesome.anything
http://thingisawesome.anything
http://thingisawesome.anything/
https://www.thingisawesome.anything/

Failure:

http://thingisawesome.ventures/index.html
https://subdomain.geocities.com/
https://www.twitter.com/8288hs98ff

This got me close:

(http)s?(:\/\/)(w*)(\.?)(\w*)(.)(\w*)(\/?)

But it would not reject the ones to fail, only match part of it.

Luke Pighetti
  • 4,541
  • 7
  • 32
  • 57
  • 1
    [`anything`, `ventures`, and `com` are TLDs](https://en.wikipedia.org/wiki/Top-level_domain) in your examples. Am I correct in understanding that you want to match URLs that have only the `http` or `https` scheme, any second-level domain, and an optional subdomain of `www`? – Aankhen Jul 20 '18 at 19:09
  • Sorry, I misspoke. I am not looking for TLDs, but instead a link that has only a second level domain, that is http or https. – Luke Pighetti Jul 20 '18 at 19:14

1 Answers1

0

Based on the examples (which aren’t matching TLDs), which show newline-separated lists of URLs (no unencoded IDNs), and assuming the capture groups in your attempt aren’t used later, you want to match (in multiline mode):

Putting that together gives us:

^https?://(?:www\.)?[a-zA-Z0-9][a-zA-Z0-9-]+\.[a-zA-Z0-9][a-zA-Z0-9-]+/?$

Try it.

Aankhen
  • 2,198
  • 11
  • 19