1

I'm trying to get a regex to capture the base URL from a URL string. This

^(.+?[^\/:])(?=[?\/]|$)

works. REGEX101

But when I try to use it within postgresql

regexp_replace(content_url,'^(.+?[^\\/:])(?=[?\\/]|$)', '\1') 

it does not

1252748
  • 14,597
  • 32
  • 109
  • 229

2 Answers2

1

RegexBuddy gives this warning about the first '?'

PostgreSQL is inconsistent in the way it handles lazy quantifiers in regular expressions with alternation because it attempts to match the longest alternative, instead of being eager and accepting the first alternative that matches

and if you remove it, it seems to work, i.e ^(.+[^\/:])(?=[?\/]|$)

however, if you're trying to parse the baseurl that regex won't work. Use this instead:

select regexp_replace('....', '^(.*:)//([a-z\-.]+)(:[0-9]+)?(.*)$', '\2')
gordy
  • 9,360
  • 1
  • 31
  • 43
  • `SELECT regexp_replace('http://stackoverflow.com/questions/1991608/find-base-name-in-url-in-javascript','^(.+[^\/:])(?=[?\/]|$)', '\1') AS content_url;` just gives me a box. Like a little "unknown character" box. – 1252748 Sep 05 '14 at 16:37
  • @thomas maybe it's the escaping, works here http://www.sqlfiddle.com/#!15/cfab1/4/0 – gordy Sep 05 '14 at 16:48
  • Can you provide another solution that will work with postgre? – 1252748 Sep 05 '14 at 17:46
  • `[Error Code: 0, SQL State: 2201B] ERROR: invalid regular expression: invalid character range :(` – 1252748 Sep 05 '14 at 20:38
  • what version of postgres are you using? – gordy Sep 05 '14 at 20:59
0

PostGreSQL has an interesting regular expression engine. It took me a while to figure out what be escaped and what needs to be double-escaped. The solution that worked for me is:

(regexp_matches(content_url,'(https?:\/\/\\w+(?:\\.\\w+)+)'))[1] AS content_url

Hope this can help someone.

1252748
  • 14,597
  • 32
  • 109
  • 229