5

I need to implement stringUtils Class indexOf() method in postgresql.

Lets say I have a table in which url is one of the column.

url : "http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit"

My requirement is to find the index of the 3rd occurence of '/' in the above url and do substring and take only paypal-info.com host name in Postgresql Query

Any idea on implementing this would be grateful. Thanks

Sri
  • 1,505
  • 2
  • 18
  • 35

3 Answers3

5

Have you tried split_part method?

SELECT split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 3)

Result:

split_part
paypal-info.com

For other string functions try this doc: http://www.postgresql.org/docs/9.1/static/functions-string.html

Edit: as for indexOf itself I don't know any built-in postgres solution. But using two string functions You can achieve it like this:

SELECT strpos('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 4)) - 1 as index_of;
Michał Schielmann
  • 1,372
  • 8
  • 17
  • That looks great Michal. but how to get the indexOf 3rd '/' alone without splitting. My aim is to take the index? how can we do that? – Sri Aug 22 '14 at 09:30
  • @sri Sorry, I focused on this part of the question: "and do substring and take only paypal-info.com host name in Postgresql Query". And so i did take the the host name out. As for finding the third occurence of the '/' I don't think there is a simple solution in postgres. You would have to nest string functions. – Michał Schielmann Aug 22 '14 at 10:06
  • @sri - I've edited the answer with some workaround - I don't know if it's ok for you. I used the position of the forth substring taken from the aforementioned `split_part()`. – Michał Schielmann Aug 22 '14 at 10:13
  • @sri - you can wrap this workaround from my edit in your own util function, and you'll have the result you need. – Michał Schielmann Aug 22 '14 at 10:15
4

The string functions and operators section of the manual is the equivalent of String.indexOf, e.g.

select position('/' in 'http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');

however it doesn't offer the option to get the n'th occurrence.

You're really approaching this all wrong. You should use proper URL parsing code to extract the host portion, not attempt to roll your own or use regex / splitting / string mangling.

PostgreSQL doesn't have a native URL/URI type, but its procedural languages do and it's trivial to wrap suitable functions. e.g. with PL/Python:

create language plpythonu;
create or replace function urlhost(url text) returns text 
language plpythonu 
immutable strict
as $$
import urlparse
return urlparse.urlparse(url).netloc
$$;

then:

regress=# select urlhost('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
     urlhost     
-----------------
 paypal-info.com
(1 row)

If you'd prefer to use PL/Perl, PL/V8, or whatever, that's fine.

For best performance, you could write a simple C function and expose that as an extension.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • I'm not sure your definition of "wrong" is correct. – Vlad Aug 22 '14 at 14:57
  • @Vlad Well, it's not as bad as parsing HTML with regex, but why roll your own when you can use well tested, pre-written code? – Craig Ringer Aug 22 '14 at 15:04
  • I rather meant that it depends on specific requirements. – Vlad Aug 22 '14 at 15:25
  • @Vlad PL/Perl or PL/Python are part of the core PostgreSQL distribution. Python's `urlparse` is part of Python's core. Pretty safe bet IMO, and you can always fall back to one of the `split_part` solutions already offered by others if you really needed to for some reason. – Craig Ringer Aug 22 '14 at 16:33
1

Just replace 3 with N to get the index of the Nth '/' in a given string

SELECT length(substring('http://asd/asd', '(([^/]*/){3})')) - 1

To extract the host name from url you can use

SELECT substring('http://asd.com:234/qwe', 'http://([^:]+).*/')

Tested here: SQLFiddle

Vlad
  • 1,723
  • 12
  • 16