Below is the structure of my table diary, which has some entries about trips. The problem is there are around 4,000 duplicate entries which I found after hashing the notes column into hash column. Total entries are around 47,000.
I want to delete all the duplicate rows leaving just 1 row based on the hash
column
I can find the duplicate rows using
SELECT `url`, COUNT(hash) c from diary GROUP BY `hash` HAVING c > 1 ORDER BY `c` DESC;
and then use this SQL to find the url id of the duplicate rows
SELECT `url`, places, days,`hash` FROM diary WHERE `hash` = (select `hash` from diary WHERE `url` = ?);
For this table I want to have any 1 row from the first 2 rows as they have same hash and 3rd, 4th row. Also to be noted in the diary table there may be duplicate hash
with count as high as 400.
Ans: Add hash as unique key then INSERT IGNORE INTO diary2 (c1, c2, c3) SELECT cl, c2 FROM diary;