Get duplicates really is it a bad query or bad database design?

Question

In the second paragrpah says this:

"The fact that the resultset has duplicates is frequently (though not always) the result of a poor database design, an ineffective query, or both".

Later use as example the Adventures database, that I guess that it has a good design.

Well, my doubt is this case. I have to tables, Persons and Orders and I want to get all persons that has at least one order which total >= $200. I would use this query:

Select Persons.* from Persons, Orders where
Orders.IDPerson = Persons.IDPerson
and Orders.Total >= 200;

In this case I can get many times the same person because has more than one order which total is 200 or more. Really I want each person once in the results, so is this query a bad query because a I can get the same person many times?

Another option is this query:

select * from Person where
IDPerson IN(select IDPerson from Orders where total >= 200);

In this case I get only once each person, although this person has more than one order with a total >= 200. But use a subquery to avoid duplicates in the main query is it a good idea?

In this case, Persons and orders, I guess that the database design is not bad, because I don't know which other option I have to design this model, and the query I guess is very simple, but I have the doubt if in this case to get duplicates it is a sign of bad query.

In sumary, get duplicates is a bad query in this case?

Thanks.

Is bad practice NOT to use EXPLICIT `JOIN` :) AdventureWorks design is ok. The first query returns as it should. But you have `1:n` relationship. That is why you get duplicates. If you want only `Persons` data you could add `SELECT DISTINCT Persons.*` but the solution with subquery is slightly better. — Lukasz Szozda, Apr 27 '16 at 09:16
the query(1) is not ineffective but wrong. your question in english is: all persons that has at least one order which total >= $200, your question in sql is: all orders with total >= $200 and who bought them — A ツ, Apr 27 '16 at 09:20
Getting Duplicate value means i) Bad DB Design ii) Poor query in this order..I think we should avoid that de-normalization step which lead to duplicate value.I am not saying to avoid de-normalization altogether. — KumarHarsh, Apr 27 '16 at 09:31

score 2 · Accepted Answer · edited May 23 '17 at 10:28

I think the first query is bad like this. Seems not useful to get duplicates which you later need to remove by using DISTINCT.

The seconds query with the sub-query seems more useful in the context (maybe sometimes it makes more sense to use "exists" rather than "in").

SQL Server IN vs. EXISTS Performance

Such query would also be possible as well:

select * 
from Person
join
(
    select IDPerson
    from Orders 
    where total >= 200  
) PersonsWithMoreThan200Total
on Person.IDPerson = PersonsWithMoreThan200Total.IDPerson

Get duplicates really is it a bad query or bad database design?

1 Answers1