0

I've almost finished a project involving customers and products and only identified at the end that we have duplicate records due to keying errors, where sales staff have added the same customer to the database more than once.

What I need to do is to identify the duplicate records by comparing Customer name and their Postcode and merge the Products so that the resulting updated products field is consistent with all of the products that are applicable to them, but only one customer record exists.

In order to explain this, I have put together a small example.

    DROP TABLE IF EXISTS `tblProducts`;
    CREATE TABLE `tblProducts` (
      `ID` int(10) DEFAULT NULL,
      `Customer` varchar(30) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      `Postcode` varchar(30) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
      `Products` varchar(30) COLLATE utf8mb4_unicode_ci DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

    INSERT INTO `tblProducts` VALUES ('1', 'Bradford', 'BR1 2HJ', '111&222&444');
    INSERT INTO `tblProducts` VALUES ('2', 'Bradford', 'BR1 2HJ', '222');
    INSERT INTO `tblProducts` VALUES ('3', 'Tanner', 'TE4 9PO', '777&333');
    INSERT INTO `tblProducts` VALUES ('4', 'Smythe', 'SM3 8KO', '111&222');
    INSERT INTO `tblProducts` VALUES ('5', 'Francis', 'FL2 6HG', '444&333');
    INSERT INTO `tblProducts` VALUES ('6', 'Tanner', 'TE4 9PO', '555');
    INSERT INTO `tblProducts` VALUES ('7', 'Peters', 'PE4 4PE', '444');
    INSERT INTO `tblProducts` VALUES ('8', 'Jeffrey', 'JE9 4JK', '444&555&888');
    INSERT INTO `tblProducts` VALUES ('9', 'Barnes', 'BA5 5AB', '999');
    INSERT INTO `tblProducts` VALUES ('10', 'Smythe', 'SM1 4GE', '888&777&222');

If we run the following query, you will see that we have two duplicates, for Bradford and Tanner.

    SELECT Customer, Postcode, COUNT(*) FROM tblProducts group by Customer, Postcode having count(*) > 1

    Customer    Postcode    COUNT(*)
    Bradford    BR1 2HJ     2
    Tanner      TE4 9PO     2

The separate duplicate records are:

    Customer  Postcode  Products
    Bradford  BR1 2HJ   111&222&444
    Bradford  BR1 2HJ   222
    Tanner    TE4 9PO   777&333
    Tanner    TE4 9PO   555

I need to run a MySQL query to 'merge products where customer and postcode count > 1' as above, so the end result will be:

    Customer  Postcode  Products
    Bradford  BR1 2HJ   111&222&444
    Tanner    TE4 9PO   777&333&555

Note that there is only one instance of 222 in the first record as 222 already existed. The duplicate record will be removed from the MySQL table so that only one record exists.

I must admit, I had assumed this would be easy for MySQL to achieve and have spent ages researched merging rows, merging fields, removing duplicates and not found anything that seems to specifically to help.

Link to jsfiddle if it helps: http://sqlfiddle.com/#!9/966550/4/0

Can anyone help please as I am stuck.

Many thanks,

Rob

Rob Wassell
  • 81
  • 1
  • 10

1 Answers1

1
SELECT TP.Customer,TP.Postcode,TP.Products
FROM tblProducts TP
INNER JOIN
(
    SELECT MIN(ID) ID FROM tblProducts GROUP BY Customer, Postcode
)INNERTABLE  ON INNERTABLE.ID=TP.ID

You can try above query.

halfer
  • 19,824
  • 17
  • 99
  • 186
Sagar Gangwal
  • 7,544
  • 3
  • 24
  • 38
  • Wow, I am very impressed. My only additional question though is that is a SELECT and not an UPDATE so it isn't actually updating the table to remove the duplicate as far as I can see? – Rob Wassell Aug 01 '17 at 11:22
  • @RobWassell If you want to delete duplicate row then you can check here https://stackoverflow.com/questions/4685173/delete-all-duplicate-rows-except-for-one-in-mysql – Sagar Gangwal Aug 01 '17 at 12:02
  • I have been testing and this doesn't quite work as expected. It does de-dupe the records by taking the first record and ignoring the duplicate record but it doesn't merge the values in the Products field. Instead of expected: Bradford BR1 2HJ 111&222&444 Tanner TE4 9PO 777&333&555 Instead get this: Bradford BR1 2HJ 111&222&444 Tanner TE4 9PO 777&333 Note it hasn't merged the products for Tanner. Taking the first value and discarding the second was the relative easy part but trying to merge unique values is where I am really stuck. – Rob Wassell Aug 02 '17 at 08:07