What happens if there is a illegal character? Does the URL fix it self by encoding the illegal characters into something else?
2 Answers
As explained here
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=. Any other character needs to be encoded with the percent-encoding (%hh). Each part of the URI has further restrictions about what characters need to be represented by an percent-encoded word.
Allowed characters
RFC 3986 defines which characters are allowed in which URI components.
RFCs for specific URI schemes might further restrict this.
If you are interested in HTTP/HTTPS URIs: they are defined in RFC 7230. AFAIK they don’t have further restrictions regarding allowed characters, so you could stick to the definitions in RFC 3986.
What happens if illegal characters are used?
Depends on many factors … could be anything from "nothing happens" to "doesn’t work anymore".
Does the URL fix it self by encoding the illegal characters into something else?
A URI can’t fix itself, it’s just a string.
Clients working with this URI (browser, server, email client, etc.) may try to fix a URI (or work with invalid URIs) according to their own rules.
URI vs. link
Also note that there’s a difference between a URI and linking to (or storing etc.) this URI in a document.
The host language (e.g., HTML) might have rules what to encode. This does not change the URI, only the way the URI is stored/specified in this document.
For example, the valid URI http://example.com/a&b
would have to be linked like this in HTML documents:
<a href="http://example.com/a&b">Link</a>
But the URI is still http://example.com/a&b
, not http://example.com/a&b
.