Why are duplicate results returned even when distinct keyword is used?

Question

When I run the following query, I am returned two entries with duplicate results. Why are duplicate results returned when I’m using distinct here? The primary keys are the house number, street name, and unit number.

SELECT distinct
  house_num,
  Street_name,
  Unit_Designator,
  Unit_Num
FROM voterinfo.voter_info
WHERE house_num = 420 
  AND street_name = "PARK"
  AND Unit_Num = '' 
  AND Unit_Designator = '';

Verify if spaces are the same as well (maybe an extra space somewhere makes the addresses different). — Brad, Mar 04 '19 at 20:13
Without sample input and output, how do you think we can help yoU? — Eric, Mar 04 '19 at 20:51

Gordon Linoff · Answer 1 · 2019-03-04T20:29:09.463

4

select distinct is a statement that ensures that the result set has no duplicate rows. That is, it filters out rows where every column is the same (and NULL values are considered equal).

It does not look at a subset of columns.

Sometimes, people use select distinct and don't realize that it applies to all columns. It is rather amusing when the first column is in parentheses -- as if parentheses make a difference (they don't).

Then, you might also have situations where values look the same but are not.

Consider this simple example where values differ by only a space as the end of string:

select distinct x
from (select 'a' as x union all
      select 'a '
     ) y;

Here is a db<>fiddle with this example.

This returns two rows, not 1.

Without sample data it is hard to say which of these situations you are referring to. But the rows that you think are "identical" really are not.

edited Mar 04 '19 at 20:29

answered Mar 04 '19 at 20:15

Gordon Linoff

1,242,037
58
646
786

Does MySQL care about trailing blanks here? – jarlh Mar 04 '19 at 20:19
" It is rather amusing when the first column is in parentheses -- as if parentheses make a difference (it doesn't)." Well not In MySQL atleast PostgreSQL does support it more or less with `DISTINCT ON (column [, column ..]), column [, column ...]` but you can [simulate](https://stackoverflow.com/questions/53868870/how-to-use-distinct-on-with-mysql-using-activerecord/53869691#53869691) it in MySQL – Raymond Nijland Mar 04 '19 at 20:22
3

@RaymondNijland . . . `DISTINCT ON` is *NOT* `SELECT DISTINCT`. Postgres (of course) supports `SELECT DISTINCT` which behaves as described. – Gordon Linoff Mar 04 '19 at 20:25
Yes where did you read where i said that `SELECT DISTINCT` in PostgreSQL handles different then MySQL `SELECT DISTINCT` ? i was talking about PostgreSQL's `SELECT DISTINCT ON(columns),*` syntax which the topicstarter might more or less needs here and needs to simulate but hard to say without example data and expected results. Maybe i explained it wrong when i said PostgreSQL supported that more or less when using `DISTINCT` with parentheses? – Raymond Nijland Mar 04 '19 at 20:32

score 0 · Answer 2 · answered Mar 04 '19 at 20:26

For the fields with datatype as Char or similar ( Street_name,Unit_Designator) it is possible that there are spaces that aren't visible in the query editor that are to be removed by applying appropriate trimming logic.Please refer below link,

MySQL select fields containing leading or trailing whitespace

Why are duplicate results returned even when distinct keyword is used?

2 Answers2