0

Is there a standard for protocol names in URIs e.g. http: or file:? I'm trying to develop a regex that will detect if a URI starts with a protocol name but I'm not sure what characters are allowed there.

nine9ths
  • 796
  • 6
  • 15
  • 1
    Aaaaand after answering I discover a duplicate, [which I've apparently also answered, albeit slightly differently](http://stackoverflow.com/questions/3641722/valid-characters-for-uri-schemes/3641775#3641775)... – BoltClock Nov 07 '12 at 21:39

1 Answers1

4

RFC 3986, section 3.1 has the grammar:

scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

Which means protocol names must start with a letter, and can be followed by any number of letters, numbers, or +, - or . symbols. Protocol names are case-insensitive (that is, HTTP and http should be treated the same), but generally they should be canonicalized to lowercase forms (so HTTP should become http).

Community
  • 1
  • 1
BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
  • One more small piece of the puzzle, the scheme component delimiter (":") is defined here https://tools.ietf.org/html/rfc3986#section-3 – nine9ths Nov 07 '12 at 21:58
  • +1 And if you're lazy, check out my article "[Regular Expression URI Validation](http://jmrware.com/articles/2009/uri_regexp/URI_regex.html)" for all the associated RFC3986 regex code snippets. – ridgerunner Nov 08 '12 at 00:37