What are the legal and illegal characters in URL/Link?

Question

What happens if there is a illegal character? Does the URL fix it self by encoding the illegal characters into something else?

score 8 · Answer 1 · edited May 23 '17 at 12:17

8

As explained here

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=. Any other character needs to be encoded with the percent-encoding (%hh). Each part of the URI has further restrictions about what characters need to be represented by an percent-encoded word.

edited May 23 '17 at 12:17

Community

1
1

answered May 22 '15 at 17:00

Anirudh

2,286
4
38
64

score 4 · Answer 2 · edited Oct 07 '21 at 05:54

Allowed characters

RFC 3986 defines which characters are allowed in which URI components.

RFCs for specific URI schemes might further restrict this.

If you are interested in HTTP/HTTPS URIs: they are defined in RFC 7230. AFAIK they don’t have further restrictions regarding allowed characters, so you could stick to the definitions in RFC 3986.

What happens if illegal characters are used?

Depends on many factors … could be anything from "nothing happens" to "doesn’t work anymore".

Does the URL fix it self by encoding the illegal characters into something else?

A URI can’t fix itself, it’s just a string.

Clients working with this URI (browser, server, email client, etc.) may try to fix a URI (or work with invalid URIs) according to their own rules.

URI vs. link

Also note that there’s a difference between a URI and linking to (or storing etc.) this URI in a document.
The host language (e.g., HTML) might have rules what to encode. This does not change the URI, only the way the URI is stored/specified in this document.

For example, the valid URI http://example.com/a&b would have to be linked like this in HTML documents:

<a href="http://example.com/a&amp;b">Link</a>

But the URI is still http://example.com/a&b, not http://example.com/a&b.

What are the legal and illegal characters in URL/Link?

2 Answers2

Allowed characters

What happens if illegal characters are used?

URI vs. link

Linked