34

I was thinking about Registering an Application to a URL Protocol and I'd like to know, what characters are allowed in a scheme?

Some examples:

  • h323 (has numbers)
    • h323:[<user>@]<host>[:<port>][;<parameters>]
  • z39.50r (has a . as well)
    • z39.50r://<host>[:<port>]/<database>?<docid>[;esn=<elementset>][;rs=<recordsyntax>]
  • paparazzi:http (has a :)
    • paparazzi:http:[//<host>[:[<port>][<transport>]]/

So, what characters can I fancy using?
Can we have...

  • @:TwitterUser
  • #:HashTag
  • $:CapitalStock
  • ?:ID-10T

...etc., as desired, or characters in the scheme are restricted by standard?

Community
  • 1
  • 1
Camilo Martin
  • 37,236
  • 20
  • 111
  • 154

3 Answers3

41

According to RFC 2396, Appendix A:

  scheme        = alpha *( alpha | digit | "+" | "-" | "." )

Meaning:

The scheme should start with a letter (upper or lower case), and can contains letters (still upper and lower case), number, "+", "-" and ".".


Note: in the case of

paparazzi:http:[//<host>[:[<port>][<transport>]]/

the scheme is only the "paparazzi" part.

Vivien Barousse
  • 20,555
  • 2
  • 63
  • 64
  • I see. But there are RFCs that use numbers... Why? – Camilo Martin Sep 04 '10 at 10:11
  • Numbers are allowed in the URI scheme, but not as first character. 'a234' is valid, while '4bcd' isn't. – Vivien Barousse Sep 04 '10 at 10:19
  • Do you think the fact that it will be used only as an URL protocol on Windows has any impact on the usability of other characters? – Camilo Martin Sep 04 '10 at 11:19
  • +1; What Vivien said re: the "paparazzi:" scheme. The `http://...` is passed on to the WebKit stuff. (NB: I'm the author of the app, and am also crazy and have a BNF-style document on the URL format on the site.) – Wevah Nov 21 '10 at 08:26
  • 2
    `paparazzi` is akin to `mailto`: it has no hierarchy hence no `//` – Knu Nov 03 '18 at 15:04
  • Chiming in 11 years later on Camilo's question: Windows does not enforce the starts-with-alpha limitation, but because all popular browsers do, you should follow it anyway. – EricLaw Sep 02 '21 at 00:00
12

The scheme according to RFC 3986 is defined as:

scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

So the scheme must begin with an alphabetic character (AZ, az) and may be followed by any number of alphanumeric characters, +, -, or ..

Community
  • 1
  • 1
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • Do you think using it as a Windows-only URL protocol has any impact on the characters used? If that changes anything I'd do some tests... – Camilo Martin Sep 04 '10 at 11:22
6

Quoth RFC 2396:

Scheme names consist of a sequence of characters beginning with a lower case letter and followed by any combination of lower case letters, digits, plus ("+"), period ("."), or hyphen ("-").

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356