1

I can do two select queries:

  1. select the top row customer_id
  2. select all rows that match the above customer_id

But instead, I want to know if I can do the above in a single query?

I know it's possible using a subquery as in

SELECT * FROM customers
WHERE cust_id=(SELECT cust_id FROM customers LIMIT 1)

Is this an efficient way, or there is something better?

The requirement is to archive records that belong to a random cust_id or it might be correct for it to be the oldest cust_id (using aws lambda function in Python which runs periodically). Hence, I would like to fetch a cust_id and every other row with the same cust_id in a single transaction.

Using Postgres 10.5, with the DB table definition as below.

id BIGINT PRIMARY KEY,
cust_id VARCHAR(100) NOT NULL,
created TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
cust_data JSONB

Sample input

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|8 |cust9  |{"a": {"b": "c"}}       |25/11/2022  01:05:39|
|25|cust1  |{"x": "y"}              |05/12/2022  14:59:21|
|40|cust9  |{"d": {"b": {"c": "z"}}}|07/12/2022  11:29:14|
|87|cust2  |{"r": {"s": "t"}}       |13/12/2022  21:10:18|
|99|cust1  |{"p": "q"}              |20/12/2022  14:59:21|

Expected output

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|8 |cust9  |{"a": {"b": "c"}}       |25/11/2022  01:05:39|
|40|cust9  |{"d": {"b": {"c": "z"}}}|07/12/2022  11:29:14|

OR

Expected output

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|25|cust1  |{"x": "y"}              |05/12/2022  14:59:21|
|99|cust1  |{"p": "q"}              |20/12/2022  14:59:21|

OR

Expected output

|id|cust_id|     cust_data          |     created        |
|--|-------|------------------------|--------------------|
|87|cust2  |{"r": {"s": "t"}}       |13/12/2022  21:10:18|

During one query, we only want one of these expected outputs. Also need to mention that the created here is not the actual timestamp when this entry was created, as the entries made into this table are copied from another. So I think we can't decide which is the cust_id that has got the oldest entry.

Oia
  • 13
  • 3
  • 2
    *"top row customer_id"*: Top how? `LIMIT 1` without `ORDER BY` produces an arbitrary result. Disclose your Postgres version and (relevant parts of) your table definition, please. Also, "better" depends on your requirements, specifics of your storage and workload, and typical data distribution. – Erwin Brandstetter Dec 22 '22 at 03:41
  • Please show sample input and the expected result. Don't let people guess what is "top" for you, but explain it. – Jonas Metzler Dec 22 '22 at 05:52
  • @ErwinBrandstetter Have edited my requirements, pls help thanks – Oia Dec 22 '22 at 12:09
  • @JonasMetzler Have edited my requirements, pls help thanks – Oia Dec 22 '22 at 12:09
  • There is still no table which shows sample input and none which shows the expected outcome. Please also read https://stackoverflow.com/help/minimal-reproducible-example – Jonas Metzler Dec 22 '22 at 12:18
  • @JonasMetzler All I want to know is whether making two queries from my Python code to Postgres DB by 1.SELECT cust_id FROM customers LIMIT 1 ; 2.SELECT * FROM customers WHERE cust_id=(cust_id from the prev query result) OR calling in a single query with the one I posted in my original question is a better solution? – Oia Dec 22 '22 at 12:55
  • 1
    Well, as already said by Erwin, this is not clear enough (because LIMIT without ORDER doesn't make sense and the intended order is unknown for us). Maybe someone will get an answer and maybe this will even help you, but I will focus questions which provide some sample data and expected outcome. You should keep in mind finding a correct answer is also difficult and requires effort, so if your question is unprecise, people will switch to a more precise question. – Jonas Metzler Dec 22 '22 at 13:00
  • @JonasMetzler sorry just had trouble creating that table, is it better now? – Oia Dec 22 '22 at 17:59
  • @ErwinBrandstetter Possible to help pls? – Oia Jan 03 '23 at 09:40

2 Answers2

2

What you are doing is correct.

Just make sure to use ORDER BY to ensure that the cust_id is not random.

SELECT * FROM customers
WHERE cust_id=(SELECT cust_id FROM customers ORDER BY DATE LIMIT 1)
1

To get all rows for a single, arbitrary customer in a single query, WITH TIES in Postgres 13 or later should be most efficient:

SELECT *
FROM   customers
ORDER  BY cust_id
FETCH  FIRST 1 ROWS WITH TIES;

If the table is big, support this query with an index on (cust_id).

See:

Postgres 10 has reached EOL in 2022, so you need to upgrade ASAP anyway.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you, will do so! But at the moment, do you think that my query with a subquery would suffice or making two separate select queries wouldn't make a difference either? – Oia Jan 03 '23 at 11:57
  • @Oia: For older versions of Postgres your query is as good as it gets. Without `ORDER BY`, the result can change from one invocation to the next without changing data in the table. – Erwin Brandstetter Jan 04 '23 at 23:50