2

I'm writing a python script that searches for duplicate entries in a table in a mysql database using python-mysql-connector. I want the function to output the duplicate entries in a table of customer information. I'm not sure how to store the duplicates and keep track of the index of the item in the table. Should they be stored in a list or set?

import mysql.connector

dbconnect = mysql.connector.connect(host='localhost', user='root', password='wordpass', db='contacts')


cur= dbconnect.cursor(buffered= True)


rows= cur.fetchall()


def find_duplicates(query):

    for row in rows:

       query= cur.execute ("SELECT id, name, address1, city, postal_code COUNT(*) FROM customer " 
    "GROUP BY name, address1, city, postal_code HAVING COUNT(*) > 1")

       if row in cur.fetchone():
         return row
       else:
         cur.fetchone()
evvdogg
  • 37
  • 1
  • 9

1 Answers1

1

I think you can change your query to return you the full duplicated result set. Something like that should work I think:

SELECT t.* FROM customer AS t 
    INNER JOIN (
        SELECT name, address1, city, postal_code 
        FROM customer GROUP BY name, address1, city, postal_code 
        HAVING COUNT(*) > 1) AS td 
    ON t.name = td.name AND t.address1 = td.address1 
        AND t.city = td.city AND t.postal_code = td.postal_code;

Once you have all the dupes in hand with IDs you can present group them in python easily I think.

user3012759
  • 1,977
  • 19
  • 22
  • Thanks for your input. Would that query be better than the one I currently have: SELECT id, name, address1, city, postal_code, COUNT(*) FROM customer GROUP BY name, address1, city, postal_code HAVING COUNT(*) > 1 For multiple SQL solutions in other posts, this query seems to work. – evvdogg Mar 23 '17 at 17:33
  • Your query will only return one result per duplicate set, so you will not display all duplicate entries from the table using it. – user3012759 Mar 24 '17 at 11:57
  • I see. I'll try incorporating this query into my code and see if it generates the result I'm looking for. So, it will still be necessary to iterate through the rows like: rows= cur.fetchall() for row in rows: cur.execute(query) #query goes here – evvdogg Mar 24 '17 at 16:50
  • Yes, you still need to iterate through the rows to show them to users. If you have too many results look into how to page the results with `ORDER BY` and `LIMIT` – user3012759 Mar 24 '17 at 17:35