Count duplicates and update table with a single query

Question

I have a table which has several thousand records. I want to update all the records which have a duplicate firstname How can I achieve this with a single query? Sample table structure:

Fname varchar(100)
Lname varchar(100)
Duplicates int

This duplicate column must be updated with the total number of duplicates with a single query. Is this possible without running in a loop?

Can you show some sample data <--> desired output? – eggyal May 01 '12 at 14:54 — eggyal, May 01 '12 at 14:54

score 2 · Answer 1 · answered May 01 '12 at 14:57

2

update table as t1
inner join ( 
select
fname,
count(fname) as total
from table
group by fname) as t2
on t1.fname = t2.fname
set t1.duplicates = t2.total

answered May 01 '12 at 14:57

Nicola Cossu

54,599
15
92
98

Apologies! It just appeared to violate that constraint. Weird. – eggyal May 01 '12 at 15:05
I tried the above query, but I was unable to view the results. I have close to 200,000 records, after a while it was hanging so I had to kill the process. Is there way to limit the records? I just want to test the above query for say 1000 records. – Krishna Iyer May 01 '12 at 20:56
Add a limit clause after group by clause and retry. You can even try do add an index on fname field in order to speed up your query. – Nicola Cossu May 01 '12 at 21:12
I added a limit clause but even then just to process 1000 records its taking several minutes and its not completing I had to kill the process. Any ways to quicken the process? – Krishna Iyer May 02 '12 at 13:20
I am unable to process 200,000 records. Any ways to make this work? – Krishna Iyer May 07 '12 at 13:16

score 2 · Answer 2 · answered May 01 '12 at 15:01

2

I have a table which has several thousand records. I want to update all the records which have a duplicate firstname How can I achieve this with a single query?

Are you absolutely sure you want to store the number of the so called duplicates? If not, it's a rather simple query:

SELECT fname, COUNT(1) AS number FROM yourtable GROUP BY fname;

I don't see why you would want to store that number though. What if there's another record inserted? What if there are records deleted? The "number of duplicates" will remain the same, and therefore will become incorrect at the first mutation.

answered May 01 '12 at 15:01

Berry Langerak

18,561
4
45
58

True, but I want to view the list of duplicates and then delete the ones I don't need. So if I process all the duplicates, it makes it easier. I might be running this "check for duplicates" process once in two weeks. – Krishna Iyer May 01 '12 at 19:41
Also your query doesn't list by duplicates, it simply lists all records. – Krishna Iyer May 01 '12 at 20:54
@KrishnaIyer - Try adding a having clause ie `HAVING COUNT(1) > 1` But if the ultimate goal is deleting duplicates, you need additional logic because that would not tell you which record to "keep". – Leigh May 02 '12 at 00:16
1

@KrishnaIyer How about, instead of executing a query to remove duplicates once in a while, you opt for a UNIQUE constraint in your database instead? My query indeed doesn't list duplicates, but like @Leigh mentioned, that's easily solved by a `HAVING COUNT(1) > 1`. – Berry Langerak May 02 '12 at 07:01
Good point. "An ounce of prevention is worth a pound of cure" :) – Leigh May 02 '12 at 09:55
I want to allow duplicates, its easy to prevent duplicates though. – Krishna Iyer May 02 '12 at 16:59

score 0 · Answer 3 · edited May 23 '17 at 12:19

0

Create the column first, then write a query like:

UPDATE table SET table.duplicates = (SELECT COUNT(*) FROM table r GROUP BY Fname/Lname/some_id)

Maybe this other SO will help?

How do I UPDATE from a SELECT in SQL Server?

edited May 23 '17 at 12:19

Community

1
1

answered May 01 '12 at 14:57

dweiss

832
4
9

[Currently, you cannot update a table and select from the same table in a subquery.](http://dev.mysql.com/doc/refman/5.5/en/update.html) – eggyal May 01 '12 at 15:03
In MSSQL, and using aliases, you can. – dweiss May 01 '12 at 15:12
1

Lovely. :) But the question is tagged [tag:mysql]. – eggyal May 01 '12 at 15:15
Didn't you just agree that nick rulez's answer was correct? And doesn't he have a select in a subquery that hits the same table as his update? And doesn't it use a similar method to what I linked to? And thusly couldn't you put this all together to say that in MySQL you can update a table and select from the same table in a subquery using an inner join as explained by nick? – dweiss May 01 '12 at 15:19
[You can't specify target table 'table' for update in FROM clause: UPDATE `table` SET `table`.duplicates = (SELECT COUNT(*) FROM `table` r GROUP BY `some_id`)](http://sqlfiddle.com/#!2/75ad7/1) – eggyal May 01 '12 at 15:22
True :) my code was pseudo and not something I tested using his tables from his MySQL Server :). No need to argue, it's really just semantics. What I was implying does work, what I wrote does not -- you are right. – dweiss May 01 '12 at 15:24

score -1 · Answer 4 · answered May 01 '12 at 14:56

-1

You might not be able to do this. You can't update the same table that you are selecting from in the same query.

answered May 01 '12 at 14:56

ethrbunny

10,379
9
69
131

Yes you can. Use the multiple-table update syntax with a self-join (like I just did in [this answer](http://stackoverflow.com/a/10399092/623041)). – eggyal May 01 '12 at 14:58
From http://dev.mysql.com/doc/refman/5.5/en/update.html "Currently, you cannot update a table and select from the same table in a subquery." – ethrbunny May 01 '12 at 15:00
1

@ethrbunny Not if you don't use the self-join hack. If you use that, it is possible. – Berry Langerak May 01 '12 at 15:02

Count duplicates and update table with a single query

4 Answers4