2

I am working on a project its built-in laravel 5.3. I have a table named names. i have two columns, idand u_names. table names contains's 4,550,000+ rows.. I have not messed with a huge data like this before. Now the problem is that this table contains duplicate records. Now I am asked to copy this and then delete this original table and create it again. After that import, the copied table into a newly created original table with PHP's select distinct query, so every record will come as unique in the original table. I know to do this with foreach loop, but don't know that what will happen if the records are above 4.5 million in number. and what configurations will be required for this like memory_limit and max_exicution_time etc.

  • Have you looked at `mysqldump`; do you have access to run queries in command line? – Manav Sep 28 '17 at 11:32
  • I am not sure I understand your question, but why do you need to involve PHP and the part about dropping the table and create a new table? Would https://stackoverflow.com/questions/3311903/remove-duplicate-rows-in-mysql or http://www.mysqltutorial.org/mysql-delete-duplicate-rows/ work? – rypskar Sep 28 '17 at 11:38
  • i would use `INSERT INTO ... SELECT` and let the database handle it completly ...read https://dev.mysql.com/doc/refman/5.7/en/insert.html – Raymond Nijland Sep 28 '17 at 11:54
  • @Mjh you wrote a too long answer, while your line `We'll use INSERT INTO ... SELECT` was enough to solve my problem. Thanks, Bro. definitely your answer solved my problem. – Muhammad Ibrar Oct 09 '19 at 13:46

1 Answers1

4

The problem

Create a table that contains unique records, based on contents of another table.

The solution

we're going to use a few queries to migrate the data.

We won't use PHP or Laravel to do anything with the data (because we don't have to). There's no reason for any data to even reach PHP.

Querying the database directly is enough. You can still use Laravel's migrations to define the queries if you want to.

The attack plan

  1. Create a new, empty table
  2. Add a unique index so we can prevent duplicate records
  3. Since names can be long, we'll slightly alter the table and add a hash of the name column - this will make our unique index 20 bytes long, for every record.
  4. To avoid manually inserting the unique key, we'll help ourselves with a trigger
  5. We'll use INSERT INTO ... SELECT syntax to move data from one table to another. We'll add IGNORE clause to ignore errors caused by duplicate records.

The code

Create the table

CREATE TABLE new_names LIKE `names`;

Alter the table to add the unique key

Adding the unique key prevents duplicates from getting in. That way you never have to use DISTINCT query

ALTER TABLE new_names ADD unique_id BINARY(20) DEFAULT NULL AFTER id;
ALTER TABLE new_names ADD unique(unique_id);

Create the trigger which updates the unique_id for us

DELIMITER $$

CREATE

    TRIGGER `new_names_before_insert` BEFORE INSERT 
    ON `new_names`
    FOR EACH ROW BEGIN
        SET NEW.unique_id = UNHEX(SHA1(NEW.name));
    END$$

DELIMITER ;

Migrate the data

INSERT IGNORE INTO new_users SELECT * FROM users;

IGNORE means that if a duplicate is encountered, ignore it and go to next record.

Conclusion

No data reached PHP, therefore PHP won't be the bottleneck and you don't have to tamper with its memory settings.

Uniqueness is guaranteed because of the unique key. You will never have duplicate records in that table.

This migration will be as fast as possible because everything happens on the database server. The insert will be atomic (either all records will get in or none), therefore it will be committed to HDD quickly.

Your task

  1. Change the table and column names accordingly
  2. Run this from MySQL termnal OR create migrations and execute this via DB::raw()

Good luck!

Mjh
  • 2,904
  • 1
  • 17
  • 16