Select rows with a specific column greater than zero in some related rows

Question

I have the following tables in a Postgres 9.5 database:

product_days

Column     | Type    | Modifiers
-----------+---------+----------------------------------------------
id         | integer | not null default nextval('product_days_id_seq'::regclass)
days_id    | integer |
product_id | integer |
available  | integer | 
price      | integer |
 Indexes:
"pk_product_days" PRIMARY KEY, btree (id)
"idx_product_days" btree (days_id)
"idx_product_days_0" btree (product_id)
Foreign-key constraints:
   "product_days_days_id_fkey" FOREIGN KEY (days_id) REFERENCES days(id)

product

 Column         |            Type             |                       Modifiers
----------------+-----------------------------+-----------------------------------------------------
id              | integer                     | not null default nextval('product_id_seq'::regclass)
name            | character varying(100)      |
number_of_items | integer                     |
created_at      | timestamp without time zone | default now()
updated_at      | timestamp without time zone | default now()
total_number    | integer                     |
 Indexes:
    "pk_product" PRIMARY KEY, btree (id)

product_days.product_id is a foreign key referring to the product table and available represents the number of products available in each day.

I want to get all products that are available (available > 0) at some specific days (days_id between 5 and 10). They should be available at all of these days.

Currently I am trying to get the result with the following query but I am not sure whether it is correct or this is the most efficient way to do it:

select product.id as p_id, product.name as p_name, product.number_of_items as items
from product_days join product ON product_days.product_id = product.id
WHERE product_days.available > 0
AND  prodcut_days.days_id between 5 and 10
group by product.id
HAVING count(*) > 5;

The output should be like:

 p_id  | p_name    | items
-------+-----------+-------
 1     | product_1 | 4
 2     | product_2 | 13

I need the most efficient way of running this query either in SQL or in plpgsql.

@RadimBača I just edited the question please check it again — Siavosh, Dec 30 '17 at 11:32
Missing: table definition (showing data types and constraints), Postgres version, sample data and the query you tried so far (even if it's not working). And I suppose you mean plpgsql where you mention "pgpsql"? — Erwin Brandstetter, Dec 30 '17 at 12:25
@ErwinBrandstetter I have edited the question could you please check it again to see if now enough information is provided — Siavosh, Dec 30 '17 at 12:58
`product_days.available_unit` is in your query, but not in your table definition. Why `HAVING count(*) > 5`? Please clarify, and clean up the formatting. And no, it's not "plpgpsql", either. It's plpgsql. — Erwin Brandstetter, Dec 30 '17 at 13:17
@ErwinBrandstetter I corrected the typos it should be greater than 5 because the product should be available in all 6 days specified by ids between 5 and 10 — Siavosh, Dec 30 '17 at 13:29
I doubt that. `prodcut_days.id between 5 and 10` hardly makes sense for a serial PK column. Maybe you mean `days_id` instead? With all the fuzzy information, you won't get "the most efficient way to do it". — Erwin Brandstetter, Dec 30 '17 at 13:31
@ErwinBrandstetter yes I corrected that as well thanks you noticed that — Siavosh, Dec 30 '17 at 13:34
Also: `HAVING count(*) > 5`. How do you know it's 5 *distinct* days? there is no constraint enforcing unique `(product_id, days_id)`. Should there be? — Erwin Brandstetter, Dec 30 '17 at 13:36
Yes it's 5 distinct days I think a constraint should be added — Siavosh, Dec 30 '17 at 13:37

Erwin Brandstetter · Accepted Answer · 2018-01-02T12:41:43.993

After clarifying the typos, yes, the query should do what you describe.

This is equivalent and a bit faster:

SELECT id AS p_id, name AS p_name, number_of_items AS items
FROM  (
   SELECT product_id AS id
   FROM   product_days
   WHERE  available > 0
   AND    days_id BETWEEN 5 AND 10
   GROUP  BY 1
   HAVING count(*) > 5
   ) d
JOIN  product p USING (id);

There should be a UNIQUE constraint enforcing max 1 entry per product and day:

ALTER TABLE product_days ADD CONSTRAINT your_name_here UNIQUE (product_id, days_id);

Using a column in sql join without adding it to group by clause

Optimize read performance

If you need to optimize performance for this particular query and there are more than a few non-qualifying rows (not available or day_id not matching), the sharpest weapon would be a partial index:

CREATE INDEX idx_name_here ON product_days (product_id)
WHERE  available > 0
AND    days_id BETWEEN 5 AND 10;

If you have autovacuum running and the write load on the table isn't too big, you should see index-only scans on this index.

And while this index (or a more generic one) is used, this query should be faster, yet:

SELECT id AS p_id, name AS p_name, number_of_items AS items
FROM  (
   SELECT product_id AS id
   FROM   product_days d5
   JOIN   product_days d6  USING (product_id)
   JOIN   product_days d7  USING (product_id)
   JOIN   product_days d8  USING (product_id)
   JOIN   product_days d9  USING (product_id)
   JOIN   product_days d10 USING (product_id)
   WHERE  d5.days_id  = 5  AND d5.available  > 0
   AND    d6.days_id  = 6  AND d6.available  > 0
   AND    d7.days_id  = 7  AND d7.available  > 0
   AND    d8.days_id  = 8  AND d8.available  > 0
   AND    d9.days_id  = 9  AND d9.available  > 0
   AND    d10.days_id = 10 AND d10.available > 0   
   ) d
JOIN  product p USING (id);

Since this is a case of relational-division at its core. See:

Thanks for your helps I'd try it – Siavosh Dec 30 '17 at 13:41 — Siavosh, Dec 30 '17 at 13:41

Select rows with a specific column greater than zero in some related rows

1 Answers1

Optimize read performance