0

I have this regex code that I want it to match any link preceded by -

this is my regex code

/-(\s+)?[-a-zA-Z0-9@:%_\+.~#?&//=]{1,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)?/

it already match these links

 - www.demo.com 
 - http://foo.co.uk/

But it doesn't match these

- WWW.TELEGRAM.COM
- WWW.c.COM
- t.mE/rrbot

you can go to this link to check it http://regexr.com/3gnb1

Engineer Passion
  • 1,051
  • 1
  • 8
  • 14

2 Answers2

1

There's two possible ways to go about it. Your regex currently excludes capital letters in the domain name, so you'd have to swap .[a-z]{2,4} for .[a-zA-Z]{2,4} or then make the whole regex case insensitive. In the latter case, you can remove A-Z from the previous groups as well, resulting in:

/-(\s+)?[-a-z0-9@:%_\+.~#?&//=]{1,256}\.[a-z]{2,4}\b(\/[-a-z0-9@:%_\+.~#?&//=]*)?/i
kano
  • 5,626
  • 3
  • 33
  • 48
1

Why are you limiting the TLD to 4 characters? There are many valid TLDs that exceed beyond that such as .finance, .movie, .academy, etc.

You can use my answer from a previous post and make some minor adjustments.

(?(DEFINE)
  (?<scheme>[a-z][a-z0-9+.-]*)
  (?<userpass>([^:@\/](:[^:@\/])?@))
  (?<domain>[a-z0-9]+(-[a-z0-9]+)*(\.[a-z0-9]+(-[a-z0-9]+)*)+)
  (?<ip>(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))
  (?<host>((?&domain)|(?&ip)))
  (?<port>(:[\d]{1,5}))
  (?<path>([^?;\#\s]*))
  (?<query>(\?[^\#;\s]*))
  (?<anchor>(\#\S*))
)
(?:^)?-\ +((?:(?&scheme):\/\/)?(?&userpass)?(?&host)(?&port)?\/?(?&path)?(?&query)?(?&anchor)?)(?:$|\s+)

You can see this regex in use here. This should catch all valid URLs (albeit the scheme is considered optional in your case, so I've made the scheme optional in the regex)

ctwheels
  • 21,901
  • 9
  • 42
  • 77