2

I have a table:

+------------+------------------------------------------------------+------+-----+---------+-------+
| Field      | Type                                                 | Null | Key | Default | Extra |
+------------+------------------------------------------------------+------+-----+---------+-------+
| person_id1 | int(10)                                              | NO   | MUL | 0       |       |
| person_id2 | int(10)                                              | NO   | MUL | 0       |       |
| priority   | smallint(5)                                          | NO   |     | 0       |       |
| link_type  | enum('member_of_band','legal_name','performs_as','') | NO   |     |         |       |
+------------+------------------------------------------------------+------+-----+---------+-------+

There are no primary key on this table, but there are indexes on person_id1, and on person_id2.

The problem is - we have inconsistent data, for example, this query:

SELECT
    COUNT(*) as c, person_id1, person_id2
FROM person_person
WHERE link_type = "member_of_band"
GROUP BY person_id1, person_id2
HAVING c > 1
LIMIT 10;

Returns:

+---+------------+------------+
| c | person_id1 | person_id2 |
+---+------------+------------+
| 2 |   50674235 |   51048792 |
| 3 |   50674245 |   50715733 |
| 2 |   50674283 |   50712621 |
| 2 |   50674322 |   50714244 |
| 2 |   50674378 |   51048804 |
| 2 |   50674438 |   51048812 |
| 4 |   50674442 |   50715733 |
| 2 |   50674449 |   50716913 |
| 2 |   50674455 |   51048803 |
| 3 |   50674469 |   50715733 |
+---+------------+------------+

Is there a way to remove all redundant records and leave those that are ok?

All I have come up is:

DELETE person_person FROM person_person
WHERE (person_id1, person_id2) IN (

    SELECT
        person_id1, person_id2
    FROM person_person
    WHERE link_type = "member_of_band"
    GROUP BY person_id1, person_id2
    HAVING COUNT(*) > 1
    LIMIT 100

) AND link_type = "member_of_band";

But that would delete all the records with doubles, and I need to delete just doubles.

mysql> select * from person_person where person_id1 = 50674245 and person_id2 = 50715733;
+------------+------------+----------+----------------+
| person_id1 | person_id2 | priority | link_type      |
+------------+------------+----------+----------------+
|   50674245 |   50715733 |        0 | member_of_band |
|   50674245 |   50715733 |        0 | member_of_band |
|   50674245 |   50715733 |        0 | member_of_band |
+------------+------------+----------+----------------+
nikita2206
  • 1,129
  • 2
  • 10
  • 17

1 Answers1

4
ALTER IGNORE TABLE person_person ADD UNIQUE INDEX (person_id1, person_id2, link_type);
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • *At very least*, you should include `link_type` in the index; however, it is not clear that even this will meet the OP's requirements, as it was only requested to remove "doubles" for `link_type='member_of_band'`. – eggyal Dec 27 '12 at 07:57
  • Added `link_type` to the index. I'm assuming he wants to remove all duplicates, he said that query was just an example. – Barmar Dec 27 '12 at 07:59
  • @SaharshShah that's because there is no table. – nikita2206 Dec 27 '12 at 08:02
  • thanks, @Barmar, I though we didn't have primary because of... hmm... some reason:) but we just forgot to add it. – nikita2206 Dec 27 '12 at 08:03
  • Oops. Added the table name :) – Barmar Dec 27 '12 at 08:03