Number duplicate records on the MySQL table

Question

Have a table with similar schema

id control code amount 
1   200     12  300
2   400     12  300
3   200     12  300
4   100     10  400
5   100     10  400
6   500     13  500

Trying to list the duplicates of records on a UI.

Using following query I can retrieve the duplicate records and show it on UI.

select * from mwt group by control,code,amount having count(id) > 1;  

id control code amount 
1   200     12  300
4   100     10  400

Here the records with id 1 and 4 are duplicates of 3 and 5 respectively.

On the UI, the user will click a check-box adjacent to the record and corresponding duplicate records should be populate to the UI. To make things easier trying to populate another column named dup_id. Using this dup_id it is possible to filter the results from UI , which is in the JSON format.

How to create a result set similar to the one shown below?

id control code amount dup_id
1   200     12  300     1
2   400     12  300
3   200     12  300     1
4   100     10  400     4
5   100     10  400     4
6   500     13  500

Note that in your first query, the returned id is arbitrary – Strawberry Jan 12 '16 at 13:15 — Strawberry, Jan 12 '16 at 13:15
...and I assume your result is wrong ?!?!?! – Strawberry Jan 12 '16 at 13:24 — Strawberry, Jan 12 '16 at 13:24

Kickstart · Answer 1 · 2016-01-12T13:27:18.377

Depending on how accurate the order has to be, you could do something like this.

This is getting all the unique control / code / amount with a count, to get a flag to know if that is a duplicate row, and ordered by control / code / amount so that they are in order. It does a cross join to initialise a few user variables.

Then it calculates a counter, only incrementing it if any of control / code / amount have changed AND it is a duplicate row. Then sets user variables to store the previous values of control / code / amount.

The outer query then orders the results back in to id order.

SELECT sub3.id, 
        sub3.control, 
        sub3.code, 
        sub3.amount, 
        sub3.dup_id
FROM
(
    SELECT sub2.id, 
            sub2.control, 
            sub2.code, 
            sub2.amount, 
            @cnt:=IF(@control=control AND @code=code AND @amount=amount AND sub2.id_count IS NOT NULL, @cnt, IF(sub2.id_count IS NULL, @cnt, @cnt + 1)),
            @control:=control,
            @code:=code,
            @amount:=amount,
            IF(sub2.id_count IS NULL, NULL, @cnt) AS dup_id
    FROM
    (
        SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub1.id_count 
        FROM mwt
        LEFT OUTER JOIN
        (
            SELECT control, code, amount, COUNT(id) AS id_count
            FROM mwt 
            GROUP BY control,code,amount 
            HAVING id_count > 1
        ) sub1
        ON mwt.control = sub1.control
        AND mwt.code = sub1.code
        AND mwt.amount = sub1.amount
        ORDER BY mwt.control, mwt.code, mwt.amount
    ) sub2
    CROSS JOIN
    (
        SELECT @cnt:=0, @control:=0, @code:=0, @amount:=0
    ) sub0
) sub3
ORDER BY id

Note that this is ordering by control, code and amount, so not an exact match for your required output (which would require getting the first duplicates ordered by id first).

EDIT - Simpler and better way to do it. This gets all the duplicate rows with the min id for those duplicates (ordered by the min id), and uses a user variable to add a sequence number for those. Then LEFT OUTER JOINs that back against the main table to put that sequence number in all the matching rows.

SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub2.dup_id 
FROM mwt
LEFT OUTER JOIN
(
    SELECT sub1.id, sub1.control, sub1.code, sub1.amount, @cnt:=@cnt+1 AS dup_id
    FROM 
    (
        SELECT MIN(id) AS id, control, code, amount
        FROM mwt 
        GROUP BY control,code,amount 
        HAVING COUNT(id) > 1
        ORDER BY id
    ) sub1
    CROSS JOIN
    (
        SELECT @cnt:=0
    ) sub0
) sub2
ON mwt.control = sub2.control
AND mwt.code = sub2.code
AND mwt.amount = sub2.amount
ORDER BY mwt.id

It is, and still isn't perfect! – Kickstart Jan 12 '16 at 13:19 — Kickstart, Jan 12 '16 at 13:19
@Strawberry - better solution now. – Kickstart Jan 12 '16 at 13:27 — Kickstart, Jan 12 '16 at 13:27
@Kickstart Thanks. The order is not important. – Pranav Jan 13 '16 at 06:54 — Pranav, Jan 13 '16 at 06:54

score 1 · Accepted Answer · answered Jan 12 '16 at 13:27

1

This seems like a simpler solution than that suggested by @kickstarter - but maybe I've misunderstood the requirement...

SELECT x.*
     , y.dup_id 
  FROM my_table x 
  LEFT 
  JOIN
     ( SELECT MIN(id) dup_id
            , control
            , code
            , amount 
         FROM my_table 
        GROUP 
           BY control
            , code
            , amount 
       HAVING COUNT(*) > 1
     ) y
    ON y.control = x.control 
   AND y.code = x.code 
   AND y.amount = x.amount;

answered Jan 12 '16 at 13:27

Strawberry

33,750
13
40
57

That seems to give the first id for each set of duplicates. In the sample data this is correct for the first set of duplicates but not the 2nd set of duplicates. – Kickstart Jan 12 '16 at 13:37
Fair enough. I have taken it as a sequence number of groups of duplicates. Depends what that field is to be used for. – Kickstart Jan 12 '16 at 13:54

score -1 · Answer 3 · edited Jan 12 '16 at 13:17

Would you need a dup_id column ?. I hope this can be achieved with a simple query like below

select id
     , control
     , code
     , amount 
  from table 
 where control = from selected Record 
   and code = from selected Record 
   and amount = from selected Record 
   and id not equals from selected Record

You can very well omit the last not equals if the requirement is to list down duplicates including the selected record.

Number duplicate records on the MySQL table

3 Answers3