0

I have the following table of inventory stock symbols and want to fetch the highest alphanumeric value which is AR-JS-20. When I say "highest" I mean that the letter order is sorted first and then numbers are factored in, so AR-JS-20 is higher than AL-JS-20.

BTW, I don't want to split anything into parts because it is unknown what symbols vendors will send me in the futire. I simply want an alphanumeric sort like you sort computer directory by name. Where dashes, undersocres, asterisks, etc. come first, then numbers, and letters last with cascading priority where the first character in the symbol has the most weight, then the second character and so on.

NOTE: The question has been edited so some of the answers below no longer apply.

AL-JS-20
AR-JS-20
AR-JS-9
AB-JS-8
AA-JS-1
1A-LM-30
2BA2-1
45HT

So ideally if this table was sorted to my requirements it would look like this

AR-JS-20
AR-JS-9
AB-JS-8
AL-JS-20
AA-JS-1
45HT
2BA2-1
1A-LM-30

However, when I use this query:

select max(symbol) from stock

I get:

AR-JS-9 
but what I want to get is: AR-JS-20

I also tried:

select max(symbol::bytea) from stock 

But this triggers error:

function max(bytea) does not exist

Jimski
  • 826
  • 8
  • 23
  • 1
    Do the stock symbols all have the same form of `XX-XX-nnn`? – Bohemian Feb 13 '23 at 00:30
  • In order to sort by a part of the string alphabetically, and by the remainder of the string numerically, you will have to split the string and covert the later half to an integer. This is going to be pretty expensive to do for the entire table for every query, so you may want to introduce an index that will be computed once upon insertion for each record. – user229044 Feb 13 '23 at 00:44
  • @Bohemian No the symbols change depending on what vendor sends the parts to the warehouse. So one vendor my have a stock symbol AR-JS-12 and nother DFF2X-1 and yet another 1-THRTD – Jimski Feb 13 '23 at 04:59
  • As PostgreSQL offers a programming language PL/pgSQL, I would probably write a function returning an array where the symbol parts are the elements and all numbers are formatted as strings of a fixed length (e.g. 40 digits). This array would be the sort key. PostgreSQL allows to sort by an array. This would work for any symbol regardless of how many parts it consists of. The only limit would be to have to choose a maximum number of digits, the numbers in the symbol must not exceed. – Thorsten Kettner Feb 13 '23 at 06:58
  • 1
    for the example `1-THRTD` does that get sorted by `THRTD` then `1` as a number? Are there ever any digits in the non-number part, eg `AB7CD-3`? – Bohemian Feb 13 '23 at 10:46

2 Answers2

1

Specify a custom order by that trims everything up to the last - and converts the remaining number to int and take the first:

select stock_code
from mytable
order by regexp_replace(stock_code, '-?[0-9]+-?', ''), regexp_replace(stock_code, '[^0-9-]', '')::int
limit 1

See live demo.

This works for numbers at both start and end of code:

  • regexp_replace(stock_code, '-?[0-9]+-?', '') "deletes" digits and any adjacent dashes
  • regexp_replace(stock_code, '[^0-9]', '') "deletes" all non-digits
Bohemian
  • 412,405
  • 93
  • 575
  • 722
1

There is dedicated tag for this group of problems: (I added it now.)

Ideally, you store the string part and the numeric part in separate columns.
While stuck with your unfortunate symbols ...

If your symbols are as regular as the sample suggests, plain left() and split_part() can do the job:

SELECT symbol
FROM   stock
ORDER  BY left(symbol, 5) DESC NULLS LAST
        , split_part(symbol, '-', 3)::int DESC NULLS LAST
LIMIT  1;

Or, if at least the three dashes are a given:

...
ORDER  BY split_part(symbol, '-', 1) DESC NULLS LAST
        , split_part(symbol, '-', 2) DESC NULLS LAST
        , split_part(symbol, '-', 3)::int DESC NULLS LAST
LIMIT  1

See:

Or, if the format is not as rigid: regular expression functions are more versatile, but also more expensive:

...
ORDER  BY substring(symbol, '^\D+') DESC NULLS LAST
        , substring(symbol, '\d+$')::int DESC NULLS LAST
LIMIT  1;

^ ... anchor to the start of the string
$ ... anchor to the end of the string
\D ... class shorthand for non-digits
\d ... class shorthand for digits

Taking only (trailing) digits, we can safely cast to integer (assuming numbers < 2^31), and sort accordingly.

Add NULLS LAST if any part can be missing, or the column can be NULL.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for the answer, I upvoted it. However. symbols are not unfortunate but a reflection of reality of how a vendor designates them. Also, storing string part and the numeric part in separate columns would be a mess because for every stock search one would need to combine the two to see what stock it is. – Jimski Feb 13 '23 at 04:16
  • @Jimski: Combining the two is trivially simple and cheap. Separating tends to be more expensive - like the Q & A at hand demonstrate. – Erwin Brandstetter Feb 13 '23 at 04:47
  • It may be trivially simple at first but becomes a mess later. If the table has multiple columns then in order to create an idex I have to mess with two columns and then in every query that searches for a stock symbol I have to combine two columns, and so on. Why to split crap that doesn't need splitting. – Jimski Feb 14 '23 at 15:59
  • Sure, various pros and cons. You can index the expression `left || right`. You can have a (`MATERIALIZED`) `VIEW` presenting the original number ... You decide what's cheaper / safer overall in your setup. – Erwin Brandstetter Feb 14 '23 at 21:41