I have a table with a varchar column, and I would like to find all the records that have duplicate values in this column. What is the best query I can use to find the duplicates?
-
2Since you mentioned find all records, I am assuming you need to know the KEYS as well as the duplicated VALUES in that varchar column. – TechTravelThink Mar 27 '09 at 04:34
-
I can find the keys easy enough after I get the values, I really just want a list of all the duplicate values. – Jon Tackabury Mar 27 '09 at 13:49
-
https://phoenixnap.com/kb/mysql-find-duplicates – Jackson Jul 02 '20 at 06:37
27 Answers
Do a SELECT
with a GROUP BY
clause. Let's say name is the column you want to find duplicates in:
SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;
This will return a result with the name value in the first column, and a count of how many times that value appears in the second.

- 158,662
- 42
- 215
- 303

- 114,835
- 27
- 73
- 90
-
38But how is this useful if you can't get the IDs of the rows with duplicate values? Yes, you can do a new query matching for each duplicate value, but is it possible to simply list the duplicates? – NobleUplift Jul 24 '14 at 14:41
-
39@NobleUplift You can do a ```GROUP_CONCAT(id)``` and it will list the IDs. See my answer for an example. – Matt R. Feb 19 '15 at 00:53
-
1Thanks @MattRardon. A co-worker showed me that command a month or so ago, but I'm glad it's listed here because otherwise a lot of these queries aren't very useful. – NobleUplift Feb 19 '15 at 16:23
-
Is it possible to do something like this, but in a subquery that returns only one column? – bheussler Jul 28 '15 at 14:16
-
6
-
1Multiple columns can be done as follows: `SELECT col1, col2, COUNT(*) c FROM table GROUP BY col1, col2 HAVING c > 1;` – mark Oct 07 '16 at 11:56
-
21I'm confused why this is the accepted answer and why it has so many upvotes. The OP asked, "I would like to find all the records that have duplicate values in this column." This answer returns a table of counts. -1 – Monica Heddneck Apr 03 '17 at 20:56
-
7For those that don't understand how HAVING works - it's simply a filter on the result set, so happens after the main query. – John Hunt Jun 22 '17 at 10:02
-
5
-
I have a similar situation but I want to fetch the row containing second duplicate value of a single column, can you help me out here – Ajax Mar 30 '21 at 12:40
-
yea ive been using this... it return the duplicated value and how many it has in that column. then i just copy the result and query separately for delete – Muhammad Asyraf May 30 '21 at 20:48
-
I added `COLLATE 'utf8_bin'` after the `GROUP BY name` to find only exact case matches in fields with utf8 collation. – masterguru Dec 13 '21 at 11:16
SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING COUNT(*) > 1;

- 31,877
- 16
- 137
- 115

- 11,167
- 7
- 37
- 46
-
13Superior to @levik's answer since it doesn't add an extra column. Makes it useful for use with `IN()`/`NOT IN()`. – wmassingham Nov 24 '15 at 20:42
-
3This answer is exactly the same as levik's answer, just written differently as IDs of duplicate values are still omitted from the result. levik's answer just uses an alias of the count and this one does not. Perhaps this one is a bit cleaner if you don't need the duplicate count. – RisingSun Dec 10 '21 at 20:11
SELECT *
FROM mytable mto
WHERE EXISTS
(
SELECT 1
FROM mytable mti
WHERE mti.varchar_column = mto.varchar_column
LIMIT 1, 1
)
ORDER BY varchar_column
This query returns complete records, not just distinct varchar_column
's.
This query doesn't use COUNT(*)
. If there are lots of duplicates, COUNT(*)
is expensive, and you don't need the whole COUNT(*)
, you just need to know if there are two rows with same value.
This is achieved by the LIMIT 1, 1
at the bottom of the correlated query (essentially meaning "return the second row"). EXISTS would only return true if the aforementioned second row exists (i. e. there are at least two rows with the same value of varchar_column) .
Having an index on varchar_column
will, of course, speed up this query greatly.

- 5,678
- 7
- 31
- 51

- 413,100
- 91
- 616
- 614
-
4
-
15This should be the accepted answer, as `GROUP BY` and `HAVING` returns only one of the possible duplicates. Also, performance with indexed field instead of `COUNT(*)`, and the possibility to `ORDER BY` to group duplicate records. – Rémi Breton Sep 22 '15 at 20:08
-
3As stated in the comments above, this query allows you to list all duplicated rows. Very useful. – TryHarder Aug 26 '16 at 01:36
-
6Looking at this I don't understand how it would work at all. Wont the inner condition always be true since any row in the outer table will also be available in the inner table and so every row will always at least match itself? I tried the query and got the result i suspected - every row returned. But with so many upvotes I'm doubting myself. Isn't the inner query missing something like "AND mto.id<>mti.id"? It does work for me when I add that. – Clox May 09 '17 at 12:35
-
1@clox: note the `LIMIT 1, 1` at the bottom of the correlated query (essentially meaning "return the second row"). `EXISTS` would only return true if the aforementioned second row exists (i. e. there are at least two rows with the same value of `varchar_column`) – Quassnoi May 09 '17 at 15:27
-
@Quassnoi Ah, right. I understand the reasoning then, didn't pay attention to that part. However like I said, it doesn't work for me. It actually returns all rows. Even if I do LIMIT 1554554,1 it still does. but if I use the inner query by itself it works as expected and only returns rows with duplicate values. It does however work as you describe if I add ORDER BY to the inner query, and makes it a tad slower than with the condition. Are you sure it works for you without that order or extra inner condition? I'm using MariaDB by the way, can that be why? 10.0.26-MariaDB-3+deb.sury.org~xenial+1 – Clox May 09 '17 at 16:14
-
@Clox: it definitely did work back in 2009 on whatever was the most recent MySQL version back then, otherwise I would not have posted it. Could you please put an sqlfiddle together? – Quassnoi May 09 '17 at 17:09
-
2@Quassnoi Alright. I've tried putting it on sqlfiddle but I've given up since every query I try to run, apart from creating the schema gets timed out. I did figure out that just removing "EXISTS" also makes the query work correctly for me. – Clox May 16 '17 at 09:34
-
Without analysis, this query took really long for a quite small table of a few thousand indexed rows. It was half a second to a second delay. The count query from the main answer was solved without noticeable delay (and it was done first, so no query cache) – John Oct 12 '17 at 01:36
-
1This subquery looks equivalent, but easier to understand: https://stackoverflow.com/a/11694832/209139. Not sure which is faster. – TRiG May 14 '19 at 15:31
-
1In mysql this query returned all results for me. I changed the WHERE statement to `EXISTS ( ... ) > 0 ` to return only the duplicates. – Jay Dee Oct 23 '20 at 12:11
Building off of levik's answer to get the IDs of the duplicate rows you can do a GROUP_CONCAT
if your server supports it (this will return a comma separated list of ids).
SELECT GROUP_CONCAT(id), name, COUNT(*) c
FROM documents
GROUP BY name
HAVING c > 1;
-
Really appreciated Matt. This is truly helpful! For those trying to update in phpmyadmin if you leave the id together with the function like this: `SELECT id, GROUP_CONCAT(id), name, COUNT(*) c [...]` it enables inline editing and it should update all the rows involved (or at least the first one matched), but unfortunately the edit generates a Javascript error... – Armfoot Sep 14 '15 at 11:25
-
How would you then calculate how many ids are subject to duplication? – CMCDragonkai Jan 17 '17 at 13:33
-
2How do I not get all the ID's grouped, but instead listed from first to last; with all their respective values in the columns next to them? So instead of grouping it, it just shows ID 1 and its value, ID 2 and its value. EVEN if the values for the ID is the same. – MailBlade Feb 15 '18 at 09:29
-
to get all the data that contains duplication i used this:
SELECT * FROM TableName INNER JOIN(
SELECT DupliactedData FROM TableName GROUP BY DupliactedData HAVING COUNT(DupliactedData) > 1 order by DupliactedData)
temp ON TableName.DupliactedData = temp.DupliactedData;
TableName = the table you are working with.
DupliactedData = the duplicated data you are looking for.
-
This one shows each duplicate in it's own row. That's what I need. Thanks. – warmwhisky Dec 05 '19 at 10:07
-
Yes, exactly what I was looking for also. Displays all data and columns in rows which have a duplicate in a specific column. – RationalRabbit May 14 '22 at 21:25
Taking @maxyfc's answer further, I needed to find all of the rows that were returned with the duplicate values, so I could edit them in MySQL Workbench:
SELECT * FROM table
WHERE field IN (
SELECT field FROM table GROUP BY field HAVING count(*) > 1
) ORDER BY field

- 7,816
- 2
- 24
- 35
Assuming your table is named TableABC and the column which you want is Col and the primary key to T1 is Key.
SELECT a.Key, b.Key, a.Col
FROM TableABC a, TableABC b
WHERE a.Col = b.Col
AND a.Key <> b.Key
The advantage of this approach over the above answer is it gives the Key.

- 3,014
- 3
- 20
- 13
-
6+1 Because it's handy. Though, ironically, the result itself contains duplicates (it lists a and b, then b and a.) – Fabien Snauwaert May 19 '16 at 10:25
-
3@FabienSnauwaert You can get rid of some of the duplicates by comparing less than (or greater than) – Michael Oct 25 '17 at 18:45
-
@TechTravelThink your answer is very clear, thanks for that but on large table it takes some time (about 2mn on more 20'000 entries table) and after show 25 first results, if I click to show next one, phpmyadmin show error "#1052 - Column 'id' in order clause is ambiguous" – bcag2 Apr 24 '20 at 08:44
SELECT *
FROM `dps`
WHERE pid IN (SELECT pid FROM `dps` GROUP BY pid HAVING COUNT(pid)>1)

- 9,474
- 36
- 90
- 105

- 121
- 1
- 2
-
2No, because this is quite possibly the slowest of the lot. Subselects are notoriously slow, as they're executed for every row returned. – Oddman Nov 21 '16 at 06:33
To find how many records are duplicates in name column in Employee, the query below is helpful;
Select name from employee group by name having count(*)>1;

- 6,009
- 10
- 39
- 82

- 119
- 1
- 2
My final query incorporated a few of the answers here that helped - combining group by, count & GROUP_CONCAT.
SELECT GROUP_CONCAT(id), `magento_simple`, COUNT(*) c
FROM product_variant
GROUP BY `magento_simple` HAVING c > 1;
This provides the id of both examples (comma separated), the barcode I needed, and how many duplicates.
Change table and columns accordingly.

- 313
- 5
- 19
I am not seeing any JOIN approaches, which have many uses in terms of duplicates.
This approach gives you actual doubled results.
SELECT t1.* FROM my_table as t1
LEFT JOIN my_table as t2
ON t1.name=t2.name and t1.id!=t2.id
WHERE t2.id IS NOT NULL
ORDER BY t1.name

- 4,812
- 1
- 31
- 34

- 1,075
- 11
- 23
-
3FYI - You'll want to 'select distinct somecol ..' if there is a potential for more than 1 duplicate record to exist otherwise the results will contain duplicates of the duplicated rows that were found. – Drew Dec 04 '18 at 21:19
I saw the above result and query will work fine if you need to check single column value which are duplicate. For example email.
But if you need to check with more columns and would like to check the combination of the result so this query will work fine:
SELECT COUNT(CONCAT(name,email)) AS tot,
name,
email
FROM users
GROUP BY CONCAT(name,email)
HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
AND also COUNT)

- 6,009
- 10
- 39
- 82
-
Exactly what was needed! Here my query, checking 3 fields for duplicates: `SELECT COUNT(CONCAT(userid,event,datetime)) AS total, userid, event, datetime FROM mytable GROUP BY CONCAT(userid, event, datetime ) HAVING total>1` – Avatar Nov 14 '19 at 09:48
I prefer to use windowed functions(MySQL 8.0+) to find duplicates because I could see entire row:
WITH cte AS (
SELECT *
,COUNT(*) OVER(PARTITION BY col_name) AS num_of_duplicates_group
,ROW_NUMBER() OVER(PARTITION BY col_name ORDER BY col_name2) AS pos_in_group
FROM table
)
SELECT *
FROM cte
WHERE num_of_duplicates_group > 1;

- 162,964
- 23
- 234
- 275
SELECT t.*,(select count(*) from city as tt
where tt.name=t.name) as count
FROM `city` as t
where (
select count(*) from city as tt
where tt.name=t.name
) > 1 order by count desc
Replace city with your Table. Replace name with your field name

- 7,816
- 2
- 24
- 35

- 109
- 1
- 5
SELECT ColumnA, COUNT( * )
FROM Table
GROUP BY ColumnA
HAVING COUNT( * ) > 1

- 2,201
- 1
- 20
- 32

- 7,690
- 7
- 41
- 64
-
3
-
No idea why this is so low, the simplest answer that worked for me by far. I still find it crazy that something as basic as identifying duplicates is a four-line command in SQL, but that's better than some of the convoluted 10-line answers that were somehow voted higher than this one. – Hashim Aziz Jun 26 '23 at 18:46
I improved from this:
SELECT
col,
COUNT(col)
FROM
table_name
GROUP BY col
HAVING COUNT(col) > 1;

- 81
- 3
As a variation on Levik's answer that allows you to find also the ids of the duplicate results, I used the following:
SELECT * FROM table1 WHERE column1 IN (SELECT column1 AS duplicate_value FROM table1 GROUP BY column1 HAVING COUNT(*) > 1)

- 479
- 7
- 17
SELECT
t.*,
(SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count
FROM `city` AS t
WHERE
(SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC
CREATE TABLE tbl_master
(`id` int, `email` varchar(15));
INSERT INTO tbl_master
(`id`, `email`) VALUES
(1, 'test1@gmail.com'),
(2, 'test2@gmail.com'),
(3, 'test1@gmail.com'),
(4, 'test2@gmail.com'),
(5, 'test5@gmail.com');
QUERY : SELECT id, email FROM tbl_master
WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)

- 340
- 1
- 14

- 79
- 4
The following will find all product_id that are used more than once. You only get a single record for each product_id.
SELECT product_id FROM oc_product_reward GROUP BY product_id HAVING count( product_id ) >1
Code taken from : http://chandreshrana.blogspot.in/2014/12/find-duplicate-records-based-on-any.html

- 361
- 2
- 7
SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;

- 7,296
- 6
- 30
- 50

- 4,648
- 3
- 29
- 33
-
1Worth noting that this is unbearably slow or might not even finish if the column being queried for is not indexed. Otherwise, I was able to change `a.email` to `a.*` and get all the IDs of the rows with duplicates. – NobleUplift Jul 24 '14 at 14:53
-
-
@Michael Well since this is three years old I can't test on whatever version of MySQL I was using, but I tried this same query on a database where the column I selected did not have an index on it, so it took quite a few seconds to finish. Changing it to `SELECT DISTINCT a.*` resolved almost instantly. – NobleUplift Oct 26 '17 at 19:54
-
@NobleUplift Ah ok. I can understand it being slow... the part that I am concerned about is "might not even finish". – Michael Oct 27 '17 at 18:58
-
@Michael I don't remember which table in our system I had to run this query on, but for the ones with a few million records they probably would have finished, but in a time that took so long that I gave up on seeing when it actually would finish. – NobleUplift Oct 27 '17 at 20:19
Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1

- 3,686
- 16
- 35
For removing duplicate rows with multiple fields , first cancate them to the new unique key which is specified for the only distinct rows, then use "group by" command to removing duplicate rows with the same new unique key:
Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;
-
-
Why not use `CREATE TEMPORARY TABLE ...`? A little explanation of your solution would be great. – maxhb Feb 04 '16 at 10:24
One very late contribution... in case it helps anyone waaaaaay down the line... I had a task to find matching pairs of transactions (actually both sides of account-to-account transfers) in a banking app, to identify which ones were the 'from' and 'to' for each inter-account-transfer transaction, so we ended up with this:
SELECT
LEAST(primaryid, secondaryid) AS transactionid1,
GREATEST(primaryid, secondaryid) AS transactionid2
FROM (
SELECT table1.transactionid AS primaryid,
table2.transactionid AS secondaryid
FROM financial_transactions table1
INNER JOIN financial_transactions table2
ON table1.accountid = table2.accountid
AND table1.transactionid <> table2.transactionid
AND table1.transactiondate = table2.transactiondate
AND table1.sourceref = table2.destinationref
AND table1.amount = (0 - table2.amount)
) AS DuplicateResultsTable
GROUP BY transactionid1
ORDER BY transactionid1;
The result is that the DuplicateResultsTable
provides rows containing matching (i.e. duplicate) transactions, but it also provides the same transaction id's in reverse the second time it matches the same pair, so the outer SELECT
is there to group by the first transaction ID, which is done by using LEAST
and GREATEST
to make sure the two transactionid's are always in the same order in the results, which makes it safe to GROUP
by the first one, thus eliminating all the duplicate matches. Ran through nearly a million records and identified 12,000+ matches in just under 2 seconds. Of course the transactionid is the primary index, which really helped.

- 3,373
- 4
- 32
- 50

- 11
- 1
If you want to remove duplicate use DISTINCT
Otherwise use this query:
SELECT users.*,COUNT(user_ID) as user
FROM users
GROUP BY user_name
HAVING user > 1;

- 110
- 3
- 4

- 11
- 1
Thanks to @novocaine for his great answer and his solution worked for me. I altered it slightly to include a percentage of the recurring values, which was needed in my case. Below is the altered version. It reduces the percentage to two decimal places. If you change the ,2 to 0, it will display no decimals, and to 1, then it will display one decimal place, and so on.
SELECT GROUP_CONCAT(id), name, COUNT(*) c,
COUNT(*) OVER() AS totalRecords,
CONCAT(FORMAT(COUNT(*)/COUNT(*) OVER()*100,2),'%') as recurringPecentage
FROM table
GROUP BY name
HAVING c > 1

- 196
- 2
- 10
Try using this query:
SELECT name, COUNT(*) value_count FROM company_master GROUP BY name HAVING value_count > 1;

- 3,039
- 8
- 36
- 56

- 93
- 4