Well, I can't say much here. Just follow the steps (three + an intermediary one) and read my comments carefully. I have chosen a convenient way for you: to run a simple query per step. It can be done also in other way, e.g. using a stored procedure, or many. But it wouldn't be better for you, since your task is a one-occasion process and a very sensible one. It is better to have the control over all operation results.
You asked me in the comments, what should you use as interface for your task. Well, MySQL Workbench is a good one for such operations, but it breaks/freezes a lot. phpmyadmin? Hm... I use for now SequelPRO and I must say, that I really like it. Can it manage your task? I don't know. But for sure I know one which can: the best MySQL software I ever used - and I'll certainly buy it for personal use too - was SQLyog. An extremely powerful, stable and robust application. Especially when you're dealing with duplicates/exports of databases: it never disappoints.
I saw that you have VARCHAR
as data type for the column productid
. Make it like this:
`productid` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
And, if you want to never have duplicates in manPartid
column, then create a UNIQUE
index on it.
I also recommend you to keep a uniform naming convention. Like:
productId
, or product_id
, instead of productid
manPartId
, or man_part_id
, instead of manPartid
And give the name products
to the products table.
Now, I structured my answer into two parts: "steps to follow" and "results". For each step I posted the corresponding step results.
Before you begin doing something:
MAKE A BACK-UP OF YOUR DATA!
I wish you good luck!
STEPS TO FOLLOW:
=================================================================
STEP 1:
=================================================================
Create a new table proTmp with the following columns:
- manPartid: definition identical with pro.manPartid
- maxLenSpec: maximum specification length of each pro.manPartid.
=================================================================
CREATE TABLE `proTmp` (
`manPartId` varchar(255) DEFAULT NULL,
`maxLenSpec` bigint(20) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
===============================================================
STEP 2:
===============================================================
- Truncate table proTmp;
- Get a dataset with all [pro.manPartid, pro.maxLenSpec] pairs;
- Store the dataset into table proTmp.
===============================================================
TRUNCATE proTmp;
INSERT INTO proTmp (
SELECT
pro.manPartid
, MAX(LENGTH(pro.specification)) AS maxLenSpec
FROM pro
GROUP BY pro.manPartid
);
=============================================================
INTERMEDIARY STEP - JUST FOR TEST.
IT ONLY DISPLAYS THE RECORDS WHICH WILL BE DELETED IN STEP 3:
=============================================================
Left join tables pro and proTmp and display only the
records with pro.lenSpec = proTmp.maxLenSpec.
- lenSpec: length of pro.specification
=============================================================
a) Get pro.*, pro.lenSpec and proTmp.* columns, ordered by pro.manPartid.
_________________________________________________________________________
SELECT
a.*
, LENGTH(a.specification) as lenSpec
, b.*
FROM pro AS a
LEFT JOIN proTmp AS b ON b.manPartid = a.manPartid
WHERE LENGTH(a.specification) = b.maxLenSpec
ORDER BY a.manPartid;
b) Get only pro.productid column, ordered by pro.productid.
___________________________________________________________
SELECT a.productid
FROM pro AS a
LEFT JOIN proTmp AS b ON b.manPartid = a.manPartid
WHERE LENGTH(a.specification) = b.maxLenSpec
ORDER BY a.productid;
====================================================================
STEP 3:
====================================================================
Delete all records from pro having pro.lenSpec != proTmp.maxLenSpec.
IMPORTANT: ordered by pro.productid !!!
====================================================================
DELETE FROM pro
WHERE
pro.productid NOT IN (
SELECT a.productid
FROM (SELECT * FROM pro AS tmp) AS a
LEFT JOIN proTmp AS b ON b.manPartid = a.manPartid
WHERE LENGTH(a.specification) = b.maxLenSpec
ORDER BY a.productid
);
THE RESULTS OF THE EXECUTED OPERATIONS:
------------------------------------------------------------------------------------------------------------------
NOTA BENE:
------------------------------------------------------------------------------------------------------------------
NOTICE THAT I ADDED A NEW RECORD INTO TABLE pro, WITH THE productid = 11 & manPartid = "ABC1". ITS specification
COLUMN HAS THE SAME MAXIMUM LENGTH AS THE RECORD WITH THE productid = 1 & manPartid = "ABC1" !!! IN THE END,
AFTER STEP 3, E:G: AFTER DELETION OF DUPLICATES, BOTH RECORDS SHOULD STILL EXIST IN TABLE pro, BECAUSE THEY BOTH
HAVE THE MAXIMUM LENGTH of specification COLUMN. THEREFORE, THERE WILL STILL EXIST SUCH DUPLICATES IN THE TABLE
pro AFTER DELETION. IN ORDER TO DECIDE WHICH ONLY ONE OF THESE DUPLICATES SHOULD REMAIN IN THE TABLE, YOU MUST
THINK ABOUT SOME OTHER CONDITIONS AS THE ONES WE KNOW FROM YOU IN THIS MOMENT. BUT, FIRST THINGS FIRST...
SEE ALSO THE RESULTS AFTER RUNNING STEP 3.
------------------------------------------------------------------------------------------------------------------
=================================================================
CREATION SYNTAX AND CONTENT OF TABLE pro, USED BY ME:
=================================================================
CREATE TABLE `pro` (
`productid` varchar(255) DEFAULT NULL,
`manPartId` varchar(255) DEFAULT NULL,
`specification` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--------------------------------------------------------
productid manPartId specification
--------------------------------------------------------
1 ABC1 5MP camera, 2500 MaH, Steel body
10 ABC5 2500 MaH, Steel body
2 ABC2 2MP camera, Steel body
3 ABC3 5MP, 6500 MaH, Red
4 ABC1 2500 MaH, Steel body
5 ABC2 5MP camera, plastic body
6 ABC4 5MP camera, 2500 MaH, Steel body
7 ABC5 15MP camera, 4500 MaH
8 ABC2 5MP camera
9 ABC3 15MP, 6500 MaH, Blue body
11 ABC1 12345678901234567890123456789012
===============================================================
STEP 1 - RESULTS: Creation of table proTmp
===============================================================
Just the table proTmp was created, without any content.
===============================================================
STEP 2 - RESULTS: Table proTmp content
===============================================================
----------------------
manPartId maxLenSpec
----------------------
ABC1 32
ABC2 24
ABC3 25
ABC4 32
ABC5 21
============================================================
INTERMEDIARY STEP RESULTS - JUST FOR TEST.
IT ONLY DISPLAYS THE RECORDS WHICH WILL BE DELETED IN STEP 3
============================================================
a) Get pro.*, pro.lenSpec and proTmp.* columns, ordered by pro.manPartid.
_________________________________________________________________________
----------------------------------------------------------------------------------------------
productid manPartId specification lenSpec manPartId maxLenSpec
----------------------------------------------------------------------------------------------
1 ABC1 5MP camera, 2500 MaH, Steel body 32 ABC1 32
11 ABC1 12345678901234567890123456789012 32 ABC1 32
5 ABC2 5MP camera, plastic body 24 ABC2 24
9 ABC3 15MP, 6500 MaH, Blue body 25 ABC3 25
6 ABC4 5MP camera, 2500 MaH, Steel body 32 ABC4 32
7 ABC5 15MP camera, 4500 MaH 21 ABC5 21
b) Get only pro.productid column, ordered by pro.productid.
___________________________________________________________
---------
productid
---------
1
11
5
6
7
9
===========================================================================================
STEP 3 - RESULTS: Table pro after deletion of all duplicates by the two conditions
===========================================================================================
From the log after running the DELETE query:
"No errors, 5 rows affected, taking 6.5 ms"
NOTA BENE: NOTICE THAT THERE ARE STILL TWO RECORDS WITH THE manPartid = "ABC1",
BECAUSE THEY BOTH HAD THE SAME MAXIMUM LENGTH OF THE specification COLUMN !!!
--------------------------------------------------------
productid manPartId specification
--------------------------------------------------------
1 ABC1 5MP camera, 2500 MaH, Steel body
11 ABC1 12345678901234567890123456789012
5 ABC2 5MP camera, plastic body
9 ABC3 15MP, 6500 MaH, Blue body
6 ABC4 5MP camera, 2500 MaH, Steel body
7 ABC5 15MP camera, 4500 MaH
I hope it all works.
EDIT 1:
DELETE ALL RECORDS EXCEPT THE ONES WITH MAXIMAL `productid`:
STEP 1: THIS STEP MUST BE APPLIED. OTHERWISE YOU ARE RECEIVING A FALSE LIST OF THE RECORDS TO DELETE!:
Convert column productid
from VARCHAR
to:
`productid` bigint(20) unsigned NOT NULL
STEP 2: Run the following DELETE
query.
DELETE FROM pro
WHERE pro.productid NOT IN (
SELECT max(b.productid) AS maxPartid
FROM (SELECT * FROM pro AS a) AS b
GROUP BY b.manPartid
);