11

In many programming languages you can compare strings using operators like >, >=, < etc...and the language will base the comparison on the position of the letter in the alphabet.

For example in PHP

if ('a' < 'b') {
    echo 'Yes';
} else {
    echo 'No';
}
> Yes

However in postgres or mysql

SELECT
CASE WHEN 'a' < 'b' THEN 'yes' END
FROM table
Output: null

I have a table with strings that I need to compare against each other through SQL.

For example: 6.2(5a) 6.2(5b) -- this would be greater than 6.2(5a) Or 6.2(15) -- this would be greater than 6.2(5a)

I thought of assigning a number to a letter using a regexp but then that would break the comparisons when there are no letter.

How would you go about this purely in SQL?

mu is too short
  • 426,620
  • 70
  • 833
  • 800
bakamike
  • 1,004
  • 2
  • 7
  • 11
  • `select 'a' < 'b';` in postgres 9.6.1 results in True. `select '6.2(5a)' < '6.2(5b)';` returns True. `select '6.2(15)' < '6.2(5a)';` returns True also. The last one is true because `6.2(` are matching. The next character `'1' < '5'` and that makes 6.2(15) smaller than 6.2(5a). Isn't that your expectation? – zedfoxus Nov 15 '16 at 03:15
  • Okay actually that seems to be the issue. The latter comparison of select '6.2(15)' < '6.2(5a)' returning true. I didn't notice that until you pointed it out. So I guess the issue now becomes how to ensure that '15' would be greater than '5a' maybe test for letters first, remove them and test for the number string? – bakamike Nov 15 '16 at 03:50
  • Possible duplicate of [PostgreSQL ORDER BY issue - natural sort](http://stackoverflow.com/questions/9173558/postgresql-order-by-issue-natural-sort) – Schwern Nov 15 '16 at 04:40

1 Answers1

20

NOTE: The original answer went off on a red herring.

A simple comparison sorts character by character.

select 'a1' < 'a9'; -- true because 'a' = 'a' and '1' < '9'.

...but quickly goes to pot.

select 'a10' < 'a9'; -- also true for the same reason.

What you want is a natural sort where the string parts are compared as strings and the numbers are compared as numbers. Doing a natural sort in SQL is not the easiest thing. You either need fixed field widths to sort each substring separately, or maybe something with regexes...

Fortunately there's pg_natural_sort_order, a Postgres extension that implements an efficient natural sort.

If you can't install extensions you can use a stored procedure like btrsort by 2kan.

CREATE FUNCTION btrsort_nextunit(text) RETURNS text AS $$
    SELECT 
        CASE WHEN $1 ~ '^[^0-9]+' THEN
            COALESCE( SUBSTR( $1, LENGTH(SUBSTRING($1 FROM '[^0-9]+'))+1 ), '' )
        ELSE
            COALESCE( SUBSTR( $1, LENGTH(SUBSTRING($1 FROM '[0-9]+'))+1 ), '' )
        END

$$ LANGUAGE SQL;

CREATE FUNCTION btrsort(text) RETURNS text AS $$
    SELECT 
        CASE WHEN char_length($1)>0 THEN
            CASE WHEN $1 ~ '^[^0-9]+' THEN
                RPAD(SUBSTR(COALESCE(SUBSTRING($1 FROM '^[^0-9]+'), ''), 1, 12), 12, ' ') || btrsort(btrsort_nextunit($1))
            ELSE
                LPAD(SUBSTR(COALESCE(SUBSTRING($1 FROM '^[0-9]+'), ''), 1, 12), 12, ' ') || btrsort(btrsort_nextunit($1))
            END
        ELSE
            $1
        END
      ;
$$ LANGUAGE SQL;

Though it doesn't provide a comparison operator and I'm not going to pretend to understand it. This allows you to use it in an order by.

select * from things order by btrsort(whatever);

To prevent your naturally sorted queries from turning to mud on large tables, you can create a btree index on the result of that function.

create index things_whatever_btrsort_idx ON things( btrsort(whatever) );

SELECT
  CASE WHEN 'a' < 'b' THEN 'yes' END
  FROM table
  Output: null

This will only output nothing if the table is empty. You don't need a table to test select statements.

SELECT
CASE WHEN 'a' < 'b' THEN 'yes' END  -- yes
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • This is great information, unfortunately this is postgres running in a greenplum and its a closed system. I can't install extensions. – bakamike Nov 15 '16 at 04:49
  • @bakamike I added an alternative using stored procedures. – Schwern Nov 15 '16 at 05:00
  • after new updates psql will deliver select 'a' < 'z'; also true, it will be nice, if you update your answer – Micheal Toru Feb 25 '20 at 10:14
  • 1
    @MichealToru `select 'a' < 'z';` is already true. Do you mean `select 'a' < 'Z';`? Which update made the change? – Schwern Feb 25 '20 at 15:41
  • yas i do. its also true – Micheal Toru Feb 25 '20 at 22:49
  • @MichealToru Hmm. `select 'a' < 'Z';` is true on dbfiddle, but false on my local Postgres 11.7 on OS X from Macports. I think what I'm actually testing is [collation differences between Mac and Linux](https://stackoverflow.com/questions/16328592/how-can-i-achieve-the-same-postgres-collation-behavior-in-linux-as-that-in-mac-o). I don't have a Postgres 12 on my Mac to see if they fixed it at the moment, I'm traveling. – Schwern Feb 26 '20 at 02:19
  • 1
    @MichealToru Turns out the first half of the answer was a red herring. I've stripped it out. Thanks for the note. – Schwern Feb 26 '20 at 05:51
  • I know this answer is already a bit older but maybe you can help. If I do in javascript `'c28xzxe29h' <= 'c28x~'` it evaluates to `true` (the value I am expecting). However, in postgres doing `select 'c28xzxe29h' <= 'c28x~';` evaluates to `false`. Any idea why postgres behaves differently here? And is there any workaround (replacing `~` with something else) to get the expected result? – fkrauthan Dec 11 '22 at 08:25
  • @fkrauthan That is strange. Probably be a collation issue, but I don't have an answer. Ask that as a question and @ me in the comments, – Schwern Dec 12 '22 at 02:58
  • Ok done: https://stackoverflow.com/questions/74790306/postgres-and-behaviors-with-special-characters – fkrauthan Dec 13 '22 at 20:04