Is there a standard for protocol names in URIs e.g. http: or file:? I'm trying to develop a regex that will detect if a URI starts with a protocol name but I'm not sure what characters are allowed there.
Asked
Active
Viewed 90 times
1 Answers
4
RFC 3986, section 3.1 has the grammar:
scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
Which means protocol names must start with a letter, and can be followed by any number of letters, numbers, or +
, -
or .
symbols. Protocol names are case-insensitive (that is, HTTP
and http
should be treated the same), but generally they should be canonicalized to lowercase forms (so HTTP
should become http
).
-
One more small piece of the puzzle, the scheme component delimiter (":") is defined here https://tools.ietf.org/html/rfc3986#section-3 – nine9ths Nov 07 '12 at 21:58
-
+1 And if you're lazy, check out my article "[Regular Expression URI Validation](http://jmrware.com/articles/2009/uri_regexp/URI_regex.html)" for all the associated RFC3986 regex code snippets. – ridgerunner Nov 08 '12 at 00:37