3

Hope you all are fine and doing good!

I have a table items with 2 million+ records the structure looks like below:

id (int) | price (decimal) | priority (int)
-------------------------------------------
10001    |  59000.25       |    1
10002    |  73000.91       |    2
10003    |  1000.23        |    1
10004    |  9567.18        |    1

The solution that I am looking for is very straightforward: How do I sort this table on price+priority on ASC|DESC order?

Current and working solution: I am using ORDER BY priority ASC, price ASC. But as far as my knowledge goes, sorting on multiple column is slow and not optimised approach (and I am facing realtime slowness because of this).

Solutions I tried: I've added a temporary column to this table:

id (int) | price (decimal) | priority (int) | new_priority (varchar)
--------------------------------------------------------------------
10001    |  59000.25       |    1           | a59000.25
10002    |  73000.91       |    2           | b73000.91
10003    |  1000.23        |    1           | a1000.23
10004    |  9567.18        |    1           | a9567.18

I've replaced 1 => a, 2 => b, 3 => c up till 10 (max number I have in database)

Now, whenever I am trying below SQLs, none of them is working

SELECT * FROM items
ORDER BY new_priority ASC

SELECT * FROM items
ORDER BY new_priority::bytea

SELECT * FROM items
ORDER BY SUBSTRING(new_priority FROM '^(.*?)( \\d+)?$'),
     COALESCE(SUBSTRING(new_priority FROM ' (\\d+)$')::INTEGER, 0)

Please advise!

Links I referred:

  1. Postgresql sorting mixed alphanumeric data
  2. Alphanumeric Sorting in PostgreSQL
  3. Alphanumeric sorting with PostgreSQL
  4. Alphanumeric case in-sensitive sorting in postgres
Puneet Pandey
  • 541
  • 2
  • 6
  • 23
  • 3
    Create an index ASC on both columns. I bet it will improve it a lot. – Jorge Campos Jun 06 '17 at 14:16
  • See multicolumn indexes: https://www.postgresql.org/docs/current/static/indexes-multicolumn.html – Roman Hocke Jun 06 '17 at 14:19
  • 2
    Well, sorting data is a rather slow process, but why do you want to sort 2mio+ records to start with? Usually you'd work on a subset of data and only sort a reasonable amount of rows. – Thorsten Kettner Jun 06 '17 at 14:19
  • my fields `price`, `priority` and `new_priority` are already indexed. @JorgeCampos , can you please show how precisely you would want indexing on both columns? – Puneet Pandey Jun 06 '17 at 14:23
  • @ThorstenKettner I am already using `limit` and `offset` in my SQL – Puneet Pandey Jun 06 '17 at 14:24
  • 1
    `limit` and `offset` will not help since in order to it work the database still do a full scan. The database need to know the total amount so it can know where to start/stop with the limit/offset clause. About the index it would be something like `create index idx_blabla on yourTable (priority ASC, price ASC)` And for reference read this: https://www.postgresql.org/docs/8.3/static/indexes-ordering.html – Jorge Campos Jun 06 '17 at 14:35
  • @PuneetPandey if you are using `limit` and `offset` for pagination, the multicolumn index can help you **a lot**, for the first few pages at least. There are [other pagination methods](http://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way) though, which can make use of that index even more. – pozs Jun 06 '17 at 14:37
  • @JorgeCampos thanks for sharing this approach. I am trying to add `ASC` index now and update the results here . Thank you @pozs to you too! Though, I am wondering if that's the only way possible? :D :\ – Puneet Pandey Jun 06 '17 at 14:45
  • 1
    "*and I am facing realtime slowness because of this*" - If you have a slow query, then **[EDIT]** your question and add the execution plan generated using **`explain (analyze, verbose)`**. [**Formatted text**](http://stackoverflow.com/help/formatting) please, [no screen shots](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557) –  Jun 06 '17 at 15:02

3 Answers3

2

How about creating an index? Indexes are a mechanism to enhance the performance of databases. Creating an index may be slow and might last some hours, but you'll notice differences in further queries.

joninx
  • 1,775
  • 6
  • 31
  • 59
0

May be you could do a better solution using NEW_PRIORITY as DECIMAL.

Assuming the value in PRICE is not greater then (say) 999999 and the PRIORITY is not too "large", you could use PRIORITY*1000000+PRICE.

Anyway, I think using index as said in comments could be a better solution.

To create a "composite index", you can use following sintax:

CREATE INDEX index_name ON table_name (priority ASC, price ASC);
etsa
  • 5,020
  • 1
  • 7
  • 18
  • I already did brainstorming on this approach and found some loopholes due to which I had to drop this approach. I have various cases (for e.g. where `priority` is 1 and `price` is 1000.0 and `priority` is 2 and price is 5000.0 . Secondly, `price` can go upto any number but yes priority is not large . Basically, I am calculating priority (based on some rules) before saving it into the database. – Puneet Pandey Jun 06 '17 at 14:36
  • nopes @etsa consider this one: `price=50000.5 and priority=10` and `price=75000.0 and priority=5` and `price=100.25 and priority=1`. I always expect to see last one first and first one towards the end of search results. – Puneet Pandey Jun 06 '17 at 15:00
  • OK, before I red your text fast and I saw your example where you put the letter for priority before the price. I think composite index is a better solution (as I wrote in my post, edited before), but - just for curiosity - to compose a new column and sort it as desired, you should add a number of fixed '0' or spaces (' ') in the new composite column and then priority (eg. 0000050000.500j and 0000075000.000e and 0000000100.250a) – etsa Jun 06 '17 at 15:11
0

How often does the data change ? If it's not often, use CLUSTER. See : https://www.postgresql.org/docs/9.5/static/sql-cluster.html It will basically sort your table in the order of your index. Combine this cluster with the answer from etsa.

The catch is, the CLUSTER doesn't maintain the order. So if you have new row inserted, it will be inserted to the last row.

librata
  • 150
  • 9