0

I am not so into database and I have the following doubt related a query that I am runngin on a MySql DB.

I have this query:

SELECT
    CD.id                AS id, 
    CD.commodity_name_en AS commodity_name_en

FROM MarketDetails AS MD
INNER JOIN MarketDetails_CommodityDetails AS MD_CD
      ON MD.id = MD_CD.market_details_id 
INNER JOIN CommodityDetails AS CD
      on MD_CD.commodity_details_id = CD.id
WHERE MD.localization_id = 1

that returns something like this (representing the list of commodities in some markets):

id                   commodity_name_en                                                                                                                                                                                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1                    Rice-Asia                                                                                                                                                                                                                                                      
2                    Apple banana                                                                                                                                                                                                                                                   
3                    Avocado                                                                                                                                                                                                                                                        
4                    Red onion                                                                                                                                                                                                                                                      
5                    White onion                                                                                                                                                                                                                                                    
6                    Green Beans                                                                                                                                                                                                                                                    
7                    Bell pepper                                                                                                                                                                                                                                                    
8                    Carrot                                                                                                                                                                                                                                                         
9                    Tomatoes                                                                                                                                                                                                                    
10                   Irish potatoes                                                                                                                                                                                                                                                 
11                   Maize                                                                                                                                                                                                                                                          
1                    Rice-Asia                                                                                                                                                                                                                                                      
3                    Avocado                                                                                                                                                                                                                                                        
5                    White onion                                                                                                                                                                                                                                                    
8                    Carrot                                                                                                                                                                                                                                                         
11                   Maize                                                                                                                                                                                                                                                          
2                    Apple banana                                                                                                                                                                                                                                                   
7                    Bell pepper                                                                                                                                                                                                                                                    
9                    Tomatoes                                                                                                                                                                                                                    
10                   Irish potatoes                                                                                                                                                                                                                                                 
1                    Rice-Asia   

As you can see these commodities can appear multiple times (because a specific commodity can be sold in multiple market).

I want to change my query in such a way that every commodities appear only once (because in the end I want the list of all the possible commodities without duplication).

So I know that I can do in this way:

SELECT
    CD.id                AS id, 
    CD.commodity_name_en AS commodity_name_en

FROM MarketDetails AS MD
INNER JOIN MarketDetails_CommodityDetails AS MD_CD
      ON MD.id = MD_CD.market_details_id 
INNER JOIN CommodityDetails AS CD
      on MD_CD.commodity_details_id = CD.id
WHERE MD.localization_id = 1
GROUP BY id

I am grouping by ID but I can group by also by name (it is the same).

My doubt is: can I use the distinct statment to obtain the same behavior?

Reading here it seems that it could be a solution: https://www.tutorialspoint.com/sql/sql-distinct-keyword.htm

So I try also this solution and it seems return the same result

SELECT DISTINCT
    CD.id                AS id, 
    CD.commodity_name_en AS commodity_name_en

FROM MarketDetails AS MD
INNER JOIN MarketDetails_CommodityDetails AS MD_CD
      ON MD.id = MD_CD.market_details_id 
INNER JOIN CommodityDetails AS CD
      on MD_CD.commodity_details_id = CD.id
WHERE MD.localization_id = 1

So what is the exact difference between the DISTINCT and the GROUP BY solution? And what is the smartest one in a case like mine?

Thank you

AndreaNobili
  • 40,955
  • 107
  • 324
  • 596
  • 3
    I would suggest running an EXPLAIN on each query and posting the result, which will highlight the efficiency of each approach – Paul Dixon Nov 24 '17 at 14:08
  • Following on @PaulDixon's comment, I would also benchmark the timing of each. – kchason Nov 24 '17 at 14:09
  • 1
    `DISTINCT` queries can often be implemented by `GROUP BY` under the hood in MySQL. I wouldn't expect either query to perform too differently. – Tim Biegeleisen Nov 24 '17 at 14:09
  • might be worth looking at https://stackoverflow.com/questions/581521/whats-faster-select-distinct-or-group-by-in-mysql – mc110 Nov 24 '17 at 14:11
  • Please refer this https://stackoverflow.com/questions/164319/is-there-any-difference-between-group-by-and-distinct – Danish Ajazi Nov 24 '17 at 14:11
  • 2
    `DISTINCT` and `GROUP BY` can do the same, but they have different purposes. So use them accordingly in order to have your query as readable as possible. Use `DISTINCT` when you have duplicate rows and want to remove the duplicates. This is seldom needed. Most often `DISTINCT` indicates that you unecessarily created duplicates as in your case. Use `GROUP BY` only when you want to aggregate rows, e.g. get a sum, maximum, or count. – Thorsten Kettner Nov 24 '17 at 14:23

2 Answers2

5

Forget using either version. If you only want columns from one table, use exists instead:

SELECT CD.id, CD.commodity_name_en
FROM CommodityDetails CD
WHERE EXISTS (SELECT 1
              FROM MarketDetails MD INNER JOIN 
                   MarketDetails_CommodityDetails MD_CD
                   ON MD.id = MD_CD.market_details_id 
              WHERE MD_CD.commodity_details_id = CD.id AND
                    MD.localization_id = 1
             );

With this version, MySQL does not need to do aggregation on the entire result set -- and that can be a big cost savings. This should be able to take advantage of the indexes used for your original query.

Note: I removed the as for your column aliases. The default alias for CD.id is id. There is no reason to specify this explicitly (unless you really like typing and verbose queries).

As for your specific question, it was answered in the comments -- DISTINCT and GROUP BY should have very similar performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Don't join everything only to have to remove from that result set later.

You want to select commodities that exist in certain markets. So you'd select from the commodities table and look up their markets in the WHERE clause (with an EXISTS or IN clause).

select id, commodity_name_en
from commoditydetails
where id in
(
  select md_cd.commodity_details_id
  from marketdetails_commoditydetails md_cd
  join marketdetails md on md.id = md_cd.market_details_id
  where md.localization_id = 1
);

Or, if you like that better, even without any join:

select id, commodity_name_en
from commoditydetails
where id in
(
  select commodity_details_id
  from marketdetails_commoditydetails
  where market_details_id in (select id from marketdetails where md.localization_id = 1)
);
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • Why this save time? – AndreaNobili Nov 24 '17 at 14:19
  • First of all when writing SQL, we want the queries readable and maintainable. This should be our first consideration. If you want to select commodities, select commodities, not commodities joined with markets. As to speed: aggregating and removing duplicates are rather slow processes. All data must be ordered first to get this done. This can be quite a task with large tables. (With small tables this won't make much of a difference.) – Thorsten Kettner Nov 24 '17 at 14:30
  • 1
    Anyway: think about speeding up queries only when you run into performance issues. Don't obfuscate queries, because you think that maybe this may somehow trick the DBMS into generating a better execution plan. – Thorsten Kettner Nov 24 '17 at 14:30