0

Given the following db structure:

Regions

id name
1 EU
2 US
3 SEA

Customers:

id name region
1 peter 1
2 henry 1
3 john 2

There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.

There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.

So maybe something like this: SELECT DISTINCT region FROM customers WHERE id IN (?, ?)

The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.

Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.

ptye
  • 21
  • 3
  • "I'd like to query the table for two rows at a time by ID" - _hold on_, why are you even doing that? Why not do a _single_ query with windowing-functions? that will be a lot faster – Dai Mar 11 '22 at 00:02
  • What does your table actually represent? What is your end-goal? What happens if 3 or more rows share the same `name` value? What if those 3 or more rows have non-consecutive `id` values? – Dai Mar 11 '22 at 00:03
  • @Dai The actual data is not too different from this, mostly UUIDS instead of string values. It doesn't matter how many of the rows present in the table share the same `name` value, you can assume there will be thousands with identical values. As part of a larger function, there needs to be a validation that both person A and person B share the same name. – ptye Mar 11 '22 at 00:08
  • @Dai I updated my question, hopefully it's a bit clearer now what I'm trying to achieve. – ptye Mar 11 '22 at 00:20

2 Answers2

1

There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.

There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.

This query should work:

WITH q AS (
    SELECT
        COUNT( * ) AS CountCustomers,
        COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
--      MIN( c.Region ) AS MinRegion
        FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
    FROM
        Customers AS c
    WHERE
        c.CustomerId = $senderCustomerId
        OR
        c.CustomerId = $receiverCustomerId
)
SELECT
    CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
    CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
    q
  • The above query will always return a single row with 2 columns: Status and SingleRegion.

  • SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.

  • As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.

  • As for the columns:

    • The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
    • The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.
Dai
  • 141,631
  • 28
  • 261
  • 374
  • Interesting. I think this would probably solve the trouble I've been having. Another approach I had just thought of was to use a regular `select distinct` and feed it directly into a for loop as described here: https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING - then simply check if we iterate more than once. It's not pretty, I expect your query will perform better. – ptye Mar 11 '22 at 00:39
  • One more thing - `MIN` and `MAX` won't work in case the IDs are of type `UUID`. Which I'm afraid they are. I expect `FIRST_VALUE` might be of some help though . – ptye Mar 11 '22 at 00:49
  • 1
    Thank you for the update. I will mark the question as answered. In the mean time, I think I found another way to do it, like this `SELECT r.region from customers s inner join customers r on s.region = r.region where s.id = 'sender_id' and r.id = 'receiver_id';` - problem is it requires an inner join, I'm not sure how bad the performance impact would be with this given a large table. – ptye Mar 11 '22 at 01:03
  • @ptye The size of the table is irrelevant: just ensure you have at least 1 index covering the columns used in the query, e.g. `CREATE INDEX IX_regions ON customers ( id, region )` (the order of the columns is important, so also try `( region, id )` (reversed order) if you get a poor `EXPLAIN` result. – Dai Mar 11 '22 at 01:20
  • Thank you for your help, much appreciated! – ptye Mar 11 '22 at 01:23
  • @ptye I'll admit that I do admire the simplicity of your query though. I honestly didn't think of doing anything like that. – Dai Mar 11 '22 at 01:26
  • @ptye You really should post your query as its own answer, un-accept mine, and accept that instead, because it is a different solution. – Dai Mar 11 '22 at 01:27
  • Looks like I'm unable to mark my own answer for another 2 days - but thanks again! Really appreciate your time and effort. – ptye Mar 11 '22 at 09:23
1

For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:

SELECT
    r.region
FROM
    customers s
INNER JOIN customers r ON 
    s.region = r.region 
WHERE s.id = 'sender_id' and r.id = 'receiver_id';

Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

ptye
  • 21
  • 3