0

This query is returning multiple results. In the bottom you can see there are two of each id number. How do I distinct records where I still get 25 records of unique values?

        SELECT *
            FROM(
                SELECT
                    DISTINCT cards.id, cards.name, cards.created_at, cards.updated_at
                    FROM cards
                        INNER JOIN card_process
                            ON card_process.card_id = cards.id
                            INNER JOIN processes
                                ON processes.id = card_process.process_id
                                INNER JOIN category_process
                                    ON category_process.process_id = processes.id
                                    INNER JOIN categories
                                        ON categories.id = category_process.category_id
                                        INNER JOIN series
                                            ON series.id = categories.serie_id
                                            INNER JOIN serie_user
                                                ON serie_user.serie_id = series.id
                        AND `cards`.`type` NOT IN ('', 'libraries')
                        AND NOT `cards`.`deleted`
                        AND NOT `categories`.`deleted`
                        AND NOT `series`.`deleted`
                        AND `cards`.`type` IN ('forms')
                        AND `series`.`id` IN (124,235,126,126,201,236,207,207,207,207,247,234,131,131,221,225,225,222)
            UNION ALL
                SELECT
                    DISTINCT cards.id, cards.name, cards.created_at, cards.updated_at
                    FROM cards
                        WHERE `cards`.`account_user_id`='9'
                            AND NOT `cards`.`deleted`
                            AND `cards`.`type` IN ('forms')
            ) AS qry
                ORDER BY `updated_at` ASC
                LIMIT 0, 25

enter image description here

Michael J. Calkins
  • 32,082
  • 15
  • 62
  • 91

2 Answers2

3

Not knowing anything about your schema or the shape of your data, you can get what you want by simplifying your query. Assuming a roughly standard-compliant SQL dialect

  • You don't need the enclosing select. A full select (unioned select expression) takes order by and that order by applies to the entire full select.

  • You don't need the distinct keyword on the individual select statements in the union: union by itself eliminates duplicate rows.

And please, when you write a long complicated query, spend some time formatting it so the next sod (who might be you a few years hence) who has to comprehend it can do so easily.

That gets you to the equivalent:

SELECT cards.id   ,
       cards.name ,
       cards.created_at ,
       cards.updated_at
FROM       cards
INNER JOIN card_process     ON card_process.card_id = cards.id
INNER JOIN processes        ON processes.id = card_process.process_id
INNER JOIN category_process ON category_process.process_id = processes.id
INNER JOIN categories       ON categories.id =  category_process.category_id
INNER JOIN series           ON series.id = categories.serie_id
INNER JOIN serie_user       ON serie_user.serie_id = series.id
                           AND `cards`.`type` NOT IN ('', 'libraries')
                           AND NOT `cards`.`deleted`
                           AND NOT `categories`.`deleted`
                           AND NOT `series`.`deleted`
                           AND `cards`.`type` IN ('forms')
                           AND `series`.`id` IN (124,235,126,126,201,236,207,207,207,207,247,234,131,131,221,225,225,222)
UNION
SELECT cards.id   ,
       cards.name ,
       cards.created_at ,
       cards.updated_at
FROM cards
WHERE `cards`.`account_user_id`='9'
  AND NOT `cards`.`deleted`
  AND `cards`.`type` IN ('forms')
ORDER BY `updated_at` ASC
LIMIT 0, 25

If MySql doesn't allow limit on a full select, then you might need the enclosing select statement.

Also, one might note that your join criteria is almost certainly incorrect. In the 1st select, the join criteria for the table serie_user has a bunch of tests:

AND `cards`.`type` NOT IN ('', 'libraries')
AND NOT `cards`.`deleted`
AND NOT `categories`.`deleted`
AND NOT `series`.`deleted`
AND `cards`.`type` IN ('forms')
AND `series`.`id` IN (124,235,126,126,201,236,207,207,207,207,247,234,131,131,221,225,225,222)

They only apply to joining in the candidate rows from serie_user. They don't apply to the result set as a whole. They should be refactored into a where clause and into the join criteria for the pertinent tables, thus:

SELECT cards.id   ,
       cards.name ,
       cards.created_at ,
       cards.updated_at
FROM       cards
INNER JOIN card_process     ON card_process.card_id        = cards.id
INNER JOIN processes        ON processes.id                = card_process.process_id
INNER JOIN category_process ON category_process.process_id = processes.id
INNER JOIN categories       ON categories.id               =  category_process.category_id
                           AND NOT `categories`.`deleted`
INNER JOIN series           ON series.id                   = categories.serie_id
                           AND NOT `series`.`deleted`
                           AND     `series`.`id`           IN (124,235,126,126,201,236,207,207,207,207,247,234,131,131,221,225,225,222)
INNER JOIN serie_user       ON serie_user.serie_id         = series.id
WHERE `cards`.`type` NOT IN ('', 'libraries')
  AND NOT `cards`.`deleted`
  AND     `cards`.`type` IN ('forms')
UNION
SELECT cards.id   ,
       cards.name ,
       cards.created_at ,
       cards.updated_at
FROM cards
WHERE     `cards`.`account_user_id` = '9'
  AND NOT `cards`.`deleted`
  AND     `cards`.`type` IN ('forms')
ORDER BY `updated_at` ASC
LIMIT 0, 25

The last thing I want to note is that you are joining in a whole raft of tables that aren't used in the result set. All you're doing is getting a subset of rows from the cards table. That suggests to me that you're likely better off if you simply got rid of the union and all the extraneous joins and simply asked the right question.

That eliminates the possibility of duplicates completely.

A little refactoring gets it down to this (no guarantee I've got it 100% correct, but you should be able to get the gist of it):

select c.id         ,
       c.name       ,
       c.created_at ,
       c.updated_at
from cards c
where   not c.deleted
  and       c.type            = 'forms'
  and (     c.account_user_id = '9'
        OR exists ( select *
                    from card_process     cp
                    join processes        p   on     p.id           = cp.process_id
                    join category_process cpx on     cpx.process_id = p.id
                    join categories       c   on     c.id           = cpx.category_id
                                             and not c.deleted
                    join series           s   on     s.id           = categories.serid_id
                                             and not s.deleted
                                             and     series.id IN ( 124 , 235 , 126 , ... )
                    join serie_user       su  on     su.serid_id     = s.id
                    where cp.card_id = c.card_id
                  )
      )
ORDER BY updated_at
LIMIT 0, 25

There's often a simple query hidden inside a complex one.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • 1
    Interesting formatting style. Drives me crazy, personally, but I can definitely see why you would implore it. – beeks May 10 '14 at 01:32
  • 2
    @beeks: SQL syntax is bad enough. And when queries get complicated you need to be be able to grok it. The O.P. nested join layout is a code smell indicating procedural thinking (nested loops) rather than thanking in terms of sets, relational algebra and predicate calculus. Further, a lot of cognitive research shows that a regular, tabular layout enhances understanding. People grok tables. – Nicholas Carey May 10 '14 at 01:57
  • @Nicolas Carey, I do agree. The OP's code is a mess, I would avoid all T-SQL like that myself. But your formatting is interesting, but drives me up the wall. To each ones own though. – beeks May 10 '14 at 03:31
  • Wow that awesome! Normally I use an ORM for all database interactions, do you have a style guide for SQL? The query above was from the style of someone else on SO I just picked up. – Michael J. Calkins May 10 '14 at 16:47
  • From wikipedia `Transact-SQL (T-SQL) is Microsoft's and Sybase's proprietary`... oh god damn it... – Michael J. Calkins May 10 '14 at 16:50
  • There's nothing specific to T-SQL in what I wrote. With the exception of the `limit` clause, it's all vanilla, standard SQL that should work on any reasonably standards-compliant implementation of SQL. – Nicholas Carey May 12 '14 at 16:22
1

Yes, that's how UNION ALL works, it simply gives you all the rows of both your queries.

If you want to remove duplicates, use UNION , without the ALL part.

nos
  • 223,662
  • 58
  • 417
  • 506