1

I have been trying to validate a URI entered as per the RFC 3986

This is the one i came up with

(?:[A-Za-z][A-Za-z0-9+.-]*:/{2})?(?:(?:[A-Za-z0-9-._~]|%[A-Fa-f0-9]{2})+(?::(?[A-Za-z0-9-._~]|[%][A-Fa-f0-9]{2})+)?@)?(?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\\.){1,126}[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?(?::[0-9]+)?(?:/(?:[A-Za-z0-9-._~]|%[A-Fa-f0-9]{2})*)*(?:\\?(?:[A-Za-z0-9-._~]+(?:=(?:[A-Za-z0-9-._~+]|%[A-Fa-f0-9]{2})+)?)(?:&|;[A-Za-z0-9-._~]+(?:=(?:[A-Za-z0-9-._~+]|%[A-Fa-f0-9]{2})+)?)*)?

But somehow this is failing for the following examples

ldap://[2001:db8::7]/c=GB?objectClass?one mailto:John.Doe@example.com

from the RFC itself.

Not sure what i am doing wrong

This check is being done when SubjectAltName is given for a certificate and I need to validate it for a well formed URI so that it doesnt fail while generation of certificate. For generation of certificate,Im using bouncycastle

1 Answers1

0

The regex provided has some errors, I have corrected them here:

(?:[A-Za-z][A-Za-z0-9+.-]*:\/{2})?(?:(?:[A-Za-z0-9-._~]|%[A-Fa-f0-9]{2})+(?::([A-Za-z0-9-._~]?|[%][A-Fa-f0-9]{2})+)?@)?(?:[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?\\.){1,126}[A-Za-z0-9](?:[A-Za-z0-9-]*[A-Za-z0-9])?(?::[0-9]+)?(?:\/(?:[A-Za-z0-9-._~]|%[A-Fa-f0-9]{2})*)*(?:\\?(?:[A-Za-z0-9-._~]+(?:=(?:[A-Za-z0-9-._~+]|%[A-Fa-f0-9]{2})+)?)(?:&|;[A-Za-z0-9-._~]+(?:=(?:[A-Za-z0-9-._~+]|%[A-Fa-f0-9]{2})+)?)*)?

However that does not correct the matching/no matching problem. Which site did this regex come from? Your link provided just seems to be a text file? I ask this as the regex doesn't even match the basic http:// etc

Take a look here to see if this previous post helps you:

Other SO post

Community
  • 1
  • 1
PeterS
  • 2,818
  • 23
  • 36