Regex for matching any URL character

Question

I have come accross a specification that said described a field as :

Any URL char

And I wanted to validate it on my side via a REGEX.

I searched a bit and, even if I found this great SO question that contains every piece of information I needed, I found it too bad not to have a question asking precisely for the regex, so here I am.

What would be a proper regex matching any URL character ?

Edit

I extracted the following regex from what I understood from the specification :

[\w\-.~:/?#\[\]@!$&'()*+,;=%]

So, is this REGEX right and exhaustive or did I miss anything ?

After reading the specification, I guess it is simply "all ASCII characters".

I guess you found the answer for yourserlf :) All I'd add is to make sure there is nothing else in your input: `^[...]*$` — Tamas Rev, Apr 24 '17 at 13:43
Yeah, I actually found the answer before asking, I posted the question in case somebody else looked for the same thing. In my case, I wanted the char component and compose it with another regex, but thanks anyways. — Jeremy Grand, Apr 24 '17 at 13:46
In this case you can post your answer too. Stack Overflow encourages this kind of self-Q&A posts too. — Tamas Rev, Apr 24 '17 at 13:59
I actually posted my answer but it got downvoted several times and people asked in comments to remove it and simply edit the question. Thus my editing of the question (because I originally posted both the question and answer) — Jeremy Grand, Apr 24 '17 at 15:31
The flag was inappropriate, you should un-delete your answer. But don't post a question inside an answer — Thomas Ayoub, Apr 24 '17 at 15:46

score 2 · Accepted Answer · edited Oct 07 '21 at 11:06

See the Characters section:

A URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols. A reserved subset of those characters may be used to delimit syntax components within a URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each component's identifying data.

Although there is an indication that only digits, letters and some symbols are supported, you may see a suggested regex to parse a URI at the Appendix B. Parsing a URI Reference with a Regular Expression that may actually match pretty every char:

The following line is the regular expression for breaking-down a well-formed URI reference into its components.

 ^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

   12            3  4          5       6  7        8 9

What you collected as a [\w.~:/?#\[\]@!$&'()*+,;=%-] pattern is too restrictive, unless \w is Unicode aware (URI may contain any Unicode letters), then, it might be working more or less for you.

If you plan to match just ASCII URLs, use ^[\x00-\x7F]+$ (any 1+ ASCII symbols) or ^[!-~]+$ (only visible ASCII).

Regex for matching any URL character

1 Answers1