3

Here is one of the many check expressions I've tried:

(url::text ~* 'https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,255}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)'::text)

Which never works.

And yet oddly enough this does (for email validation):

(email::text ~* '^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+[.][A-Za-z]+$'::text)

I'm not sure what the difference between these two check expressions is, though I'd love to know.

So how can I validate a URL via a check constraint in Postgres?

Jim Jones
  • 18,404
  • 3
  • 35
  • 44
Elegant.Scripting
  • 757
  • 4
  • 10
  • 28
  • 1
    What do you mean by "never works". What is happening, and what are you expecting to happen instead? – mroach Mar 01 '17 at 03:14
  • 1
    In general when validating things like this, you're usually better off having a more basic expression. Something super simple like '^https?://'. Otherwise in the future when you inevitably find a URL coming your way that doesn't fit your regex exactly, now you have to go make *schema* changes. Best if you can have your business logic or even UI layer do this kind of validation. It's also worth noting bulk inserts will be slowed down. – mroach Mar 01 '17 at 03:19
  • @mroach thanks! That's really helpful. I was expecting the constraint to reject the insert/update if the url wasn't a proper regex and accept the insert/update if the url is a proper regex. – Elegant.Scripting Mar 01 '17 at 03:21
  • 1
    I'm confused. You say that you want to match a URL, but you give a pattern that matches an email. – Gordon Linoff Mar 01 '17 at 03:34
  • 1
    What does a backspace (`\b`) does in a URL validating pattern? The [word boundary in PostgreSQL](https://www.postgresql.org/docs/current/static/functions-matching.html) is `\y` (or `\m` and `\M` for beginning and end of a word). Also, without `^` and `$` your regexp just validates that the input *contains* an URL, not that itself is an URL. I also suggest you to use a simpler pattern than that. But maybe consider supporting [IDNs](https://en.wikipedia.org/wiki/Internationalized_domain_name) too. – pozs Mar 01 '17 at 10:11
  • @pozs very helpful! Thank you! – Elegant.Scripting Mar 01 '17 at 13:35
  • @GordonLinoff I think I was trying to understand what the difference between the two is, why one would work, and the other wouldn't. That's a hefty question to answer though, and I think I could have framed it better. – Elegant.Scripting Mar 01 '17 at 13:40

2 Answers2

4

I've fixed your expression for URL validation with help of comment below the question. So, just for sharing:

(website :: text ~* 'https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,255}\.[a-z]{2,9}\y([-a-zA-Z0-9@:%_\+.~#?&//=]*)$' :: text)
Max Vyaznikov
  • 4,135
  • 2
  • 11
  • 10
  • 1
    I had to tweak the regex a bit to allow for anchors & query strings: `CREATE DOMAIN url AS text CHECK (VALUE ~ 'https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,255}\.[a-z]{2,9}\y([-a-zA-Z0-9@:%_\+.,~#?!&>//=]*)$'); ` – iloveitaly Oct 16 '20 at 15:34
0

I've adapted the URL regex from this answer into a Postgres check. This ought to handle any valid URL.

CHECK ((url ~ '^[a-z](?:[-a-z0-9\+\.])*:(?:\/\/(?:(?:%[0-9a-f][0-9a-f]|[-a-z0-9\._~!\$&''\(\)\*\+,;=:@])|[\/\?])*)?'::text))
spiffytech
  • 6,161
  • 7
  • 41
  • 57