4

I have 4 tables:

  • Item
  • Purchase
  • Purchase Item
  • Purchase Discount

In these tables, the Purchase Discount has two entries, all the others have only one entry. But when I query them, due to the LEFT JOIN, I'm getting duplicate entries.

This query will be running in a large database, and I heard using DISTINCT will reduce the performance. Is there any other way I can remove duplicates without using DISTINCT?

Here is the SQL Fiddle.

The result shows:

[{"item_id":1,"purchase_items_ids":[1234,1234],"total_sold":2}]

But the result should come as:

[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}]
VLAZ
  • 26,331
  • 9
  • 49
  • 67
Developer
  • 561
  • 7
  • 29
  • 1
    From the look of the fiddle, there's no reason to even use the left join there because you aren't really even using the table. What's the purpose of joining on there? – Padagomez Sep 13 '21 at 18:52
  • Yeah we have to take that, as I am taking the discount amount from the purchase_discounts table. I will update in the fiddle. thanks @Padagomez – Developer Sep 14 '21 at 04:35
  • 1
    "So I heard DISTINCT will reduce the performance" - I would first measure impact of DISTINC on performance and then look for solutions. Maybe you're solving non-existing problem. Tag [mysql] is it needed here? – Alex Yu Sep 15 '21 at 12:54
  • Just a question - could you change the database design? I could offer up some thoughts on how I think it could work better but as far as I can tell (and this is just my opinion and I'm fairly low on ranking here so please take it as opinion) that you won't be able to circumvent this issue without going about your DB design a little different. Please let me know if it's an option and I'd love the opportunity to write it up – John Lee Hedge Sep 17 '21 at 01:17
  • Why does purchase discount table has multiple discounts for the same purchase id? – Radagast Sep 17 '21 at 14:02
  • I think you need to explicitly state that you're looking for more performant solution than with DISTINCT. I see 2 answers with correlated queries that "avoid DISTINCT" and formally they are correct. – Alex Yu Sep 19 '21 at 05:42
  • Will the Item, Purchase, Purchase Item always have one to one relationship? – Kazi Mohammad Ali Nur Romel Sep 20 '21 at 18:10
  • 2
    The fiddle is great. But please also add table definitions and query string in the question. That's the recommended way. Easier to read and less susceptible to bitrot, when external links die. – Erwin Brandstetter Sep 22 '21 at 00:10

6 Answers6

4

Using correlated subquery instead of LEFT JOIN:

SELECT array_to_json(array_agg(p_values)) FROM 
( 
  SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM 
    ( 
      SELECT purchase_items.item_id AS item_id,
             ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
             SUM(purchase_items.sold) as total_sold,
             SUM((SELECT SUM(pd.discount_amount) FROM purchase_discounts pd
                  WHERE pd.purchase_id = purchase.id)) as discount_amount
       FROM items
       INNER JOIN purchase_items ON purchase_items.item_id = items.id
       INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
       WHERE purchase.id = 200
       GROUP by purchase_items.item_id
    ) as t 
  INNER JOIN items i ON i.id = t.item_id 
) AS p_values;

db<>fiddle demo

Output:

[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]
Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
4

First I would suggest to remove INNER JOIN items i ON i.id = t.item_id from the query which no reason to be there.

Then instead Left joining Purchase_Discounts table use subquery to get the Discount_amount (as mentioned in Lukasz Szozda's answer)

If there is no discount for any product then Discount_amount column will display NULL. If you want to avoid it then you can use COALESCE() as below instead:

COALESCE(SUM((select sum(discount_amount) from purchase_discounts 
                 where purchase_discounts.purchase_id = purchase.id)),0) as discount_amount

Db-Fiddle:

  SELECT array_to_json(array_agg(p_values)) FROM 
     ( 
       SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM 
         ( 
           SELECT purchase_items.item_id AS item_id,
                  ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
                  SUM(purchase_items.sold) as total_sold,
                  SUM((select sum(discount_amount) from purchase_discounts 
                      where purchase_discounts.purchase_id = purchase.id)) as discount_amount
                   FROM items
                   INNER JOIN purchase_items ON purchase_items.item_id = items.id
                   INNER JOIN purchase ON purchase.id = purchase_items.purchase_id              
                  WHERE 
                   purchase.id = 200
                  GROUP by 
                   purchase_items.item_id
         ) as t 
       
     ) AS p_values;

Output:

array_to_json
[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]

db<>fiddle here

3

The core problem is that your LEFT JOIN multiplies rows. See:

Aggregate discounts to a single row before the join. Or use a (uncorrelated) subquery expression:

SELECT json_agg(items)
FROM  (
   SELECT pi.item_id
        , array_agg(pi.id) AS purchase_items_ids
        , sum(pi.sold) AS total_sold
        ,(SELECT COALESCE(sum(pd.discount_amount), 0)
          FROM   purchase_discounts pd
          WHERE  pd.purchase_id = 200) AS discount_amount
   FROM   purchase_items pi
   WHERE  pi.purchase_id = 200
   GROUP  BY 1
   ) AS items;

Result:

[{"item_id":1,"purchase_items_ids":[1234],"total_sold":1,"discount_amount":12}]

db<>fiddle here

I added a couple of additional improvements:

  • Assuming referential integrity enforced by FK constraints, we don't need to involve the tables purchase and items at all.

  • Removed a subquery level doing nothing.

  • Using json_agg() instead of array_to_json(array_agg()).

  • Added COALESCE() to output 0 instead or NULL for no discounts.

Since discounts apply to the purchase in your model, not to individual items, it doesn't make sense to output discount_amount for every single item. Consider this query instead to return an array of items and a single, separate discount_amount:

SELECT json_build_object(
         'items'
       , json_agg(items)
       , 'discount_amount'
       , (SELECT COALESCE(sum(pd.discount_amount), 0)
          FROM   purchase_discounts pd
          WHERE  pd.purchase_id = 200)
       )
FROM  (
   SELECT pi.item_id
        , array_agg(pi.id) AS purchase_items_ids
        , sum(pi.sold) AS total_sold
   FROM   purchase_items pi
   WHERE  pi.purchase_id = 200
   GROUP  BY 1
   ) AS items;

Result:

{"items" : [{"item_id":1,"purchase_items_ids":[1234],"total_sold":1}], "discount_amount" : 12}

db<>fiddle here

Using json_build_object() to assemble the JSON object.

Your example with a single item in the purchase isn't too revealing. I added a purchase with multiple items and no discount to my fiddle.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

If you can have multiple values only in the purchase_discounts table then a subquery that aggregate multiple purchase_discounts rows into one before the join can solve the problem:

SELECT array_to_json(array_agg(p_values)) FROM 
( 
  SELECT t.item_id, t.purchase_items_ids, t.total_sold, t.discount_amount FROM 
    ( 
      SELECT purchase_items.item_id AS item_id,
             ARRAY_AGG(purchase_items.id) AS purchase_items_ids,
             SUM(purchase_items.sold) as total_sold,
             X.discount_amount
             FROM items
              INNER JOIN purchase_items ON purchase_items.item_id = items.id
              INNER JOIN purchase ON purchase.id = purchase_items.purchase_id
              LEFT JOIN (SELECT purchase_id, sum(purchase_discounts.discount_amount) AS discount_amount FROM purchase_discounts GROUP BY purchase_id) X ON X.purchase_id = purchase.id
             WHERE 
              purchase.id = 200
             GROUP by 
              purchase_items.item_id,
              X.discount_amount
    ) as t 
  INNER JOIN items i ON i.id = t.item_id 
) AS p_values;
fog
  • 3,266
  • 1
  • 25
  • 31
  • thanks @fog, here actually we cannot group the discount_amount, only we can group the item_id – Developer Sep 15 '21 at 10:15
  • Also, I am seeing this will take more time to execute. – Developer Sep 15 '21 at 11:04
  • @Developer You need to clarify that you're looking for more performant query without DISTINCT. Because formally "without DISTINCT" is possible to do with `GROUP BY` – Alex Yu Sep 19 '21 at 05:29
1

The LEFT JOIN is not causing your duplicates, I understand why you need it as there may not be any discounts, but for the data provided changing to an inner join produces the same result. You are getting duplicate entries because you use ARRAY_AGG(purchase_items.id). Further, with the data presented, the tables item and purchase are not necessary. You can use the window version of sum and distinct on to reduce the duplication of purchase_id, and eliminate the mentioned tables. Finally the middle select ... ) t can be completely removed. Resulting in: (see demo)

select array_to_json(array_agg(p_values)) 
  from (select distinct on (pi.item_id, pi.id)
                pi.item_id
              , pi.id purchase_items_ids
              , sum(pi.sold) over (partition by pi.item_id) total_sold         
              , sum(pd.discount_amount) over(partition by  pi.item_id)  discount_amount
           from purchase_items pi  
           left join purchase_discounts pd 
             on pd.purchase_id = pi.purchase_id 
          order by pi.item_id, pi.id           
       ) as p_values; 
Belayer
  • 13,578
  • 2
  • 11
  • 22
1

I think the left join does not cause, because with the Inner Join query result same as the left join, in discount with purchase_id=200 query has 2 results you can use from row_number with the partion_by same as:

ROW_NUMBER() OVER(PARTITION BY purchase_items.id order by purchase_items.id) rn

then select rn=1. you change your query for the sum function, I think that you can use from partion_by.

Mojtaba Safavi
  • 127
  • 2
  • 10