0

I have three database tables:

  • product
  • product_manufacturer
  • product_manufacturer_warranties.

The product table has a one-to-one mapping with product_manufacturer and the product_id is stored in the product_manufacturer table. The product_manufactuer table has a one-to-many mapping with the product_manufacturer_warranties table.

I need to write a query that retrieves all columns from the product table and two other columns that can be used to determine if a valid join exists for product and product_manufacturer and product_manufacturer and product_manufactuer_warranties respectively.

I have written the following co-related query that can handle the above scenario:

select product.*, pm.web_id,
        (   SELECT count(*)
            FROM product_manufacturer_warranty pmw
            WHERE pm.web_id = pmw.product_manufacturer_id)
            AS total_warranties
from product
left join product_manufacturer pm on product.web_id = pm.product_id

I wonder if there is a better or more efficient way of achieving this using SQL on the PostgreSQL server.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Cfc
  • 45
  • 8
  • You describe *"determine if a valid join exists"*, but then you show an actual count, Which is it? Bare existence or count of rows? Also, do you run this query for all or most products at once, or just one or few given products? – Erwin Brandstetter Feb 03 '23 at 00:10
  • Bare existence is what I want. The count one is the only logic that I could come up with myself. This query is run from a search screen usually so it will be paginated. – Cfc Feb 03 '23 at 06:49

2 Answers2

1

Do the aggregation once, and join to the result:

select product.*, 
       pm.web_id,
       pmw.total_warranties
from product
  left join product_manufacturer pm on product.web_id = pm.product_id
  left join (   
     SELECT product_manufacturer_id, count(*) as total_warranties
     FROM product_manufacturer_warranty pmw
     group by product_manufacturer_id
  ) as pmw on pm.web_id = pmw.product_manufacturer_id)
1

Since you are only interested in bare existence, don't run a potentially much more expensive count().
And while paging through the (potentially big) table, don't compute counts for all products, which would be much more expensive, yet.

This should give you optimal performance:

Paging up:

SELECT p.*
     , m.product_id IS NOT NULL AS has_maufacturer
     , EXISTS (
         SELECT FROM product_manufacturer_warranty w
         WHERE  w.product_manufacturer_id = m.web_id
         ) AS has_warranties
FROM  (
   SELECT *
   FROM   product
   WHERE  product_id > $max_product_id_of_last_page
   -- more filters here?
   ORDER  BY product_id
   LIMIT  $page_size
   ) p
LEFT   JOIN product_manufacturer m ON p.web_id = m.product_id

The query returns exactly what you asked for:

all columns from the product table and two other columns that can be used to determine if a valid join exists for product and product_manufacturer and product_manufacturer and product_manufactuer_warranties respectively.

  • First select products of interest in the subquery p
  • Then left-join to product_manufacturer. Cannot multiply rows, defined as one-to-one relationship!
  • Then check for bare existence with EXISTS. Substantially cheaper if there can be many related rows. See:

Paging down accordingly:

SELECT p.*
     , m.product_id IS NOT NULL AS has_maufacturer
     , EXISTS (
         SELECT FROM product_manufacturer_warranty w
         WHERE  w.product_manufacturer_id = m.web_id
         ) AS has_warranties
FROM  (
   SELECT *
   FROM   product
   WHERE  product_id < $min_product_id_of_last_page
   -- more filters here?
   ORDER  BY product_id DESC
   LIMIT  $page_size
   ) p
LEFT   JOIN product_manufacturer m ON p.web_id = m.product_id
ORDER  BY product_id;

Adapt to your actual sort order. This is more tricky for multiple ORDER BY expressions. Either way, don't fall for LIMIT / OFFSET unless your table is trivially small. See:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228