1

I have a mysql table. The headers are structured like so:

field   field   field   field   vendors
data   data    data    data    data
data   data    data    data    data
data   data    data    data    data2
data   data    data    data    data2
data   data    data    data    data2

Now, I want remove duplicate rows of rows sharing the same vendor value. So two rows that are the same but have different vendor column value should not be considered duplicate. But, two or more rows that equal and share the same vendor value would be considered duplicates and should be removed, leaving only one existing copy.

How can this be done using mysql?

I did something like this but it deletes everything instead of just the duplicate values. I want at least one copy remaining.

DELETE

FROM auto WHERE ((auto.email='carolzvalentine@yahoo.com') AND (auto.vendors='5552'))

  • Have you tried anything??Show us your effort.... – Akash KC Jul 31 '12 at 15:39
  • "CREATE TABLE new_table as SELECT * FROM auto WHERE CHAR_LENGTH(vendor) > 1 GROUP BY vendor"; –  Jul 31 '12 at 15:42
  • You have to show your effort in your question so it would be better if you write above query in your question as your effort.... – Akash KC Jul 31 '12 at 15:44

2 Answers2

1

You can add a UNIQUE index over your table's columns using ALTER IGNORE:

ALTER IGNORE TABLE mytbl ADD UNIQUE INDEX (field1, field2, field3, field4, vendors)

As explained in the manual:

IGNORE is a MySQL extension to standard SQL. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNORE is not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNORE is specified, only the first row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

This will also prevent duplicates from being added in the future (if you want to permit such, you can DROP the index once it has been created).

eggyal
  • 122,705
  • 18
  • 212
  • 237
  • Thanks, but I dont want to prevent duplicates from occurring. I just want to have the ability to flush out duplicates when I decide to. Is their a way to do that with my criteria? –  Jul 31 '12 at 15:52
  • 1
    @SofianeMerah: As I said, create the index and then drop it. – eggyal Jul 31 '12 at 15:53
  • Create an index. Then drop it. What will this do? I dont understand the operation here. Could you give me an example please? Thanks. –  Jul 31 '12 at 16:04
  • 1
    @SofianeMerah: Did you read the paragraph explaining the `IGNORE` keyword that I quoted from the MySQL manual? – eggyal Jul 31 '12 at 16:08
  • Yes I have and I'm reading the manual while I write this. I'm sure if I knew more about these functions I would have done this myself but I do not so I don't have a clue right now as to what is going on. Can you please provide context, example, and an order of operation. Thanks. –  Jul 31 '12 at 16:12
  • @SofianeMerah: I'm not sure what you mean by providing "context", other than to confirm that I am using the context given in your question; if further explanation is required, I'd point again to the quote above "*The other conflicting rows are deleted*". I gave an "example" in my answer above, using the maximum information available from your question. The order of operation is to first create the index then to drop it, as stated in my answer above. I'm sorry that I can't be more helpful: if things are still unclear, perhaps someone else will be able to articulate this better than I have. – eggyal Jul 31 '12 at 16:18
  • Yeah sorry as much as I appreciate your answer, its not very clear to me. Thanks anyway, I hope someone else can provide alternative examples to explain this. –  Jul 31 '12 at 16:31
  • Well I just have to confirm your code does not work eggyal. I get this error when running the command "ALTER IGNORE TABLE auto2 ADD UNIQUE INDEX a (email, vendors)" Error: "#1062 - Duplicate entry 'carolzvalentine@yahoo.com-5552' for key 'a'" Someone also told me that this is used to passively prevent duplicates in a table not actively remove them whenever I want. And apparently wont work if there is already dupes, which is the whole point. –  Aug 03 '12 at 04:27
  • @SofianeMerah: As explained in the manual extract quoted above, the `IGNORE` keyword is *supposed* to suppress this error and cause duplicate records to be deleted from the table. I have used `IGNORE` in this fashion many times before; and have suggested it as a [solution](http://stackoverflow.com/a/11622593/623041) to similar questions on here with positive feedback of success. I am sorry to hear that it doesn't work for you: it very much sounds like a [bug](http://bugs.mysql.com/bug.php?id=40344) in your installation of MySQL; what version are you using? – eggyal Aug 03 '12 at 05:02
  • The version I am using is MySQL client version: mysqlnd 5.0.8-dev - 20102224 - $Revision: 310735 $ I tried a solution found here http://stackoverflow.com/questions/8053447/mysql-alter-ignore-table-gives-integrity-constraint-violation But now I get a #1071 - Specified key was too long; max key length is 1000 bytes –  Aug 03 '12 at 05:26
  • @SofianeMerah: I'm not sure to which solution you're referring. As [noted by @Peter](http://stackoverflow.com/questions/8053447/mysql-alter-ignore-table-gives-integrity-constraint-violation#comment13425869_8053812) in that question, the bug report to which I linked above suggests a workaround of `set session old_alter_table=1;`. Is this what you tried? – eggyal Aug 03 '12 at 10:36
  • I tried the accepted answer. I did not try the comment suggested by Peter. I'm not familiar with that command. is that SQL or PHP? Where does it go? –  Aug 03 '12 at 14:44
  • @SofianeMerah: It is SQL (neither the other question nor the MySQL bug report from which that workaround came -- nor indeed this question/my answer -- are necessarily relevant to PHP); it should be executed on the same MySQL connection as the `ALTER TABLE` command, before that command is executed. – eggyal Aug 03 '12 at 14:49
  • I went into the table through phpmyadmin. Went to sql commands as usual and entered in and executed "set session old_alter_table=1; ALTER TABLE auto2 ENGINE MyISAM; ALTER IGNORE TABLE auto2 ADD UNIQUE INDEX a (email, vendors); ALTER TABLE auto2 ENGINE InnoDB;". I also tried it without changing the engine on the table and just using "set session old_alter_table=1; ALTER IGNORE TABLE auto2 ADD UNIQUE INDEX a (email, vendors);" But in each case I get :"Specified key was too long; max key length is 1000 bytes" :( –  Aug 03 '12 at 15:04
  • @SofianeMerah: What are the data types of the `email` and `vendors` columns? – eggyal Aug 03 '12 at 15:08
  • All the columns have this type and collation: varchar(255) utf8_general_ci. There are about 30 columns. –  Aug 03 '12 at 15:27
  • @SofianeMerah: In that case, you could consider `CREATE TABLE auto2_new LIKE auto2; INSERT IGNORE INTO auto2_new SELECT DISTINCT * FROM auto2; DROP TABLE auto2; RENAME TABLE auto2_new TO auto2`. Note that you will have to manually redefine on the new table any foreign key constraints, triggers, or table/column-specific permissions that you previously had defined on the old table. – eggyal Aug 03 '12 at 15:41
  • But wont the distinct query remove all duplicates and only leave unique rows? I actually just want only duplicate emails that also have the value '5552' in the vendors field. Later I might want to change that value. –  Aug 03 '12 at 15:45
0
"DELETE x
 FROM $table x
 LEFT
 JOIN
 ( SELECT vendors
        , email
        , MIN(lead_id) min_lead_id
     FROM $table
    GROUP
       BY vendors
        , email
 ) y
ON y.vendors = x.vendors
AND y.email = x.email
AND y.min_lead_id = x.lead_id
WHERE x.vendors = $vendor
AND y.min_lead_id IS NULL;";

This will do it. Simply replace the php variables in that code with the correct values.

$vendor = name of vendor column
$table = name of table

The only problem with this code is that its not fully optimal. With a 25,000 record set and running in a virtual dedicated server it completes the task at an average of 2 minutes.

Cheers