Let's say I have the following PostgreSQL table called products
:
CREATE TABLE IF NOT EXISTS mytable (
id serial NOT NULL PRIMARY KEY,
label VARCHAR(50) NOT NULL,
info jsonb NOT NULL,
created_at timestamp NOT NULL DEFAULT now()
);
Here is some test data. Note my actual table has millions of records.
INSERT INTO products (label, info) VALUES ('a', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('a', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('c', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('c', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('b', '[1, 2, 3]');
I want to write a query that grabs distinct labels and orders the records by the created_at
field. My first instinct would be to write the following query:
SELECT DISTINCT ON (label) * FROM products ORDER BY created_at DESC;
However, this fails with the following error:
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
Looks like I can fix this using SQL subqueries:
SELECT * FROM (
SELECT DISTINCT ON (label) * FROM products
) AS subquery ORDER BY created_at DESC;
which generates the following expected result:
id | label | info | created_at
----+-------+-----------+----------------------------
5 | b | [1, 2, 3] | 2022-11-14 03:32:23.245669
3 | c | [1, 2, 3] | 2022-11-14 03:32:23.242813
1 | a | [1, 2, 3] | 2022-11-14 03:32:23.239791
Is this the best way to fix this issue? Or is there a faster way to query this data? Note, I mention above how my actual table has millions of records so I want to come up with the best query possible.