1

I am not a databases guy,but I have been given the "fun" job of cleaning up someone else's database. We have many duplicate record in our databases and some of customers are getting double or triple billed every month.

Given the following Database example :

Table:  Customers

ID       Name        Phone          DoNotBill
1        Acme Inc    5125551212     No
2        ABC LLC     7138221661     No
3        Big Inc     4132229807     No
4        Acme        5125551212     No
5        Tree Top    2127657654     No

Is it possible to write a query that Identifies the all duplicate phone numbers (in this case records 1 and 4) and then marks and duplicate records yes by updating the DoNotBill column. But leaves the first record unmarked.

In this example case we would be left with:

ID       Name        Phone          DoNotBill
1        Acme Inc    5125551212     No
2        ABC LLC     7138221661     No
3        Big Inc     4132229807     No
4        Acme        5125551212     Yes
5        Tree Top    2127657654     No
JVMX
  • 1,016
  • 2
  • 12
  • 23
  • why not left join the table to itself on the phone number? – John Ruddell May 19 '14 at 18:03
  • possible duplicate of [Find duplicate records in MySQL](http://stackoverflow.com/questions/854128/find-duplicate-records-in-mysql) – Strawberry May 19 '14 at 18:08
  • 1
    its partly a duplicate but not the same as that post... the OP wants to update the additional records but not change the first after finding the duplicate record – John Ruddell May 19 '14 at 18:14

4 Answers4

1

To begin with I assume that the DoNotBill column only has two possible values; yes and no. In that case it should be bool instead of varchar, meaning it would be either true or false.

Furthermore I don't get the meaning of the DoNotBill column. Why wouldn't you just use something like this?

select distinct phone from customers

SQL SELECT DISTINCT

That would give you the phone numbers without duplicates and without the need for an extra column.

Tommy Ivarsson
  • 605
  • 4
  • 7
  • 1
    the reason for a do not bill is because there can be more than one organization name for a phone number.. but they would only want to bill once per customer (how their tables are set up). he wants an update query to change the table contents – John Ruddell May 19 '14 at 18:16
  • 1
    John is correct. I did not set up this database Im forced to work within the confines of the "Genius" that came before me. I would rather simply patch this mess than be given the project of rewriting the app and DB A task I am not qualified to do, nor have any desire to do. The Donotbill field is a relic of some long forgone project Its a hack I can leverage. – JVMX May 19 '14 at 18:22
  • which is what I figured :)... you can write a query to update the information multiple ways as well – John Ruddell May 19 '14 at 18:23
  • 1
    Thanks for clearing that up and I'm sorry you have to work with that mess :) – Tommy Ivarsson May 19 '14 at 18:27
1

something like this?

UPDATE 
    customers cust, 
    (SELECT 
        c1.ID, 
        c1.name, 
        c1.phone, 
        c1.DoNotBill
    FROM customers c
    LEFT JOIN 
        (SELECT 
            cc.ID 
        FROM customers cc
        ) as c1 on c1.phone = c.phone
    ) dup
SET cust.DoNotBill = 'Yes' WHERE cust.id=dup.id ;
John Ruddell
  • 25,283
  • 6
  • 57
  • 86
0

This depends on ur data amount You can do it in steps and make use some tools like excel...

This qrt

SELECT a.id,b.id,a.phone FROM clients a , clients b WHERE
A.phone =b.phone 
And a.id!=b.id

The result is all duplicated records. Add

Group by a.phone

And u will get 1 record for each 2 duplicates. if you like the records and they are whT u need. ChNge select to select a.id and Use this qry as subqry to an update sql statement

UPDATE clients SET billing='no' WHERE id IN (  sql goes here)
John Ruddell
  • 25,283
  • 6
  • 57
  • 86
Yazan
  • 6,074
  • 1
  • 19
  • 33
-1
UPDATE customers c SET c.DoNotBill="Yes";

UPDATE customers c
JOIN (
    SELECT MIN( ID ) ID, Phone
    FROM customers
    GROUP BY Phone
) u ON c.ID = u.ID AND c.Phone = u.Phone
SET c.DoNotBill="No";

That way not only duplicates are eliminated, but all multiple entries are dealt with.

downforme
  • 371
  • 2
  • 15
  • Why would you change every record in the database? seems to be a slower / more ineffective way than just finding the duplicate and change just those. – John Ruddell May 19 '14 at 18:28
  • I think this is a one time maintainance operation, so the performance doesn´t matter. With this aprroach you can eliminate not only duplicates but all multiples (updated answer) – downforme May 19 '14 at 18:38