emulating MySQL's substring_index() in PGSQL

Question

I would like to find an elegant way to emulate the behavior of MySQL's subtring_index() function in Postgres.

In MySQL, it's as easy as:

mysql> create temporary table test1(test varchar(200));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test1 values('apples||oranges'),('apples||grapes');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test1;
+-----------------+
| test            |
+-----------------+
| apples||oranges |
| apples||grapes  |
+-----------------+
2 rows in set (0.00 sec)

mysql> select substring_index(test, '||', 1) as field1, substring_index(test, '||', -1) as field2 from test1;
+--------+---------+
| field1 | field2  |
+--------+---------+
| apples | oranges |
| apples | grapes  |
+--------+---------+
2 rows in set (0.00 sec)

But my current work around in PGSQL is quite ugly:

hoth=# create temporary table test1(test text);
CREATE TABLE

hoth=# insert into test1 values('apples||oranges'),('apples||grapes');
INSERT 0 2

hoth=# select * from test1;
      test       
-----------------
 apples||oranges
 apples||grapes
(2 rows)

hoth=# select substring(test, 0, position('||' in test)) as field1,  substring(test, position('||' in test) + 2, char_length(test)) as field2  from test1;
 field1 | field2  
--------+---------
 apples | oranges
 apples | grapes
(2 rows)

Perhaps there is a more elegant solution using a regex, or maybe even by splitting the string into an array in a variable which might reduce overhead if the string was derived from a sub-query or something, I welcome any suggestions.

I suppose the outside-the-box solution is to store your data in a way that's more amenable to the queries you want to perform (e.g. by normalising it or using an array type). I realise that's not always an option, but I thought I'd throw it out there, particularly as your MySQL example appears to be specifically coded for splitting into exactly 2 parts. — IMSoP, Oct 07 '13 at 17:05

score 21 · Accepted Answer · answered Oct 07 '13 at 17:18

21

Always take the time to skim the manuals.

http://www.postgresql.org/docs/current/static/functions-string.html

If split_part(string text, delimiter text, field int) doesn't do what you want (and more, if I understand your MySQL function) then you'll need to explain where and why.

answered Oct 07 '13 at 17:18

Richard Huxton

21,516
3
39
51

1

Ah, the manual seems to separate "string functions and operators" with "other string functions", this was in the latter which I apparently overlooked. Thanks. – jesse_galley Oct 07 '13 at 17:23
@jesse_galley: ["Some of them are used internally to implement the **SQL-standard** string functions listed in Table 9-5."](http://www.postgresql.org/docs/current/static/functions-string.html) [emphasis mine]. So the first list is meant to cover the functions specified by the SQL standard whereas the second is meant to cover PostgreSQL extensions. – mu is too short Oct 07 '13 at 17:41
3

Although SPLIT_PART solves the example shown above, it is not a tool to emulate MySQL's SUBSTRING_INDEX, given that SUBSTRING_INDEX returns the substring to the left or right of a specified number of occurrences of a delimiter. How to emulate it's behavior when facing an arbitrary number of occurrences of a delimiter? For instance, selecting domain from a larger part of an URL; e.g.: selecting 'domain.com' from 'sports.adventures.hobbies.domain.com' and also from 'pets.domain.com'? For solving this problem in PostgreSQL we likely need reg expressions or combining various string functions. – Ruben Ramirez Padron Oct 08 '15 at 22:08
3

There is another reason split_part is no complete replacement for SUBSTRING_INDEX: field must be be greater than zero. MySQL allows negative values to split relative from the strings end. – zliw Nov 22 '16 at 10:36

Sahap Asci · Answer 2 · 2017-12-18T12:21:21.100

Here is how I implement (or emulate) MySQL's subtring_index() in PostgreSQL

CREATE OR REPLACE FUNCTION public.substring_index (
  str text,
  delim text,
  count integer = 1,
  out substring_index text
)
RETURNS text AS
$body$
BEGIN
  IF count > 0 THEN
    substring_index = array_to_string((string_to_array(str, delim))[:count], delim);
  ELSE
    DECLARE
      _array TEXT[];
    BEGIN
      _array = string_to_array(str, delim);
      substring_index = array_to_string(_array[array_length(_array, 1) + count + 1:], delim);    
    END;  
  END IF;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 5;

and here is the example from mysql's documentation;

postgres=# SELECT substring_index('www.mysql.com', '.', 2);
 substring_index
-----------------
 www.mysql
(1 row)

postgres=# SELECT substring_index('www.mysql.com', '.', -2);
 substring_index
-----------------
 mysql.com
(1 row)

emulating MySQL's substring_index() in PGSQL

2 Answers2

Linked