0

I need to remove duplicate records when importing my current CSV files into the database. My files has been successfully updated, but when I tried to upload the same files again, it straight away inserted into it again. is there any way of I can remove duplicate records if i am importing the files?

<?
if(isset($_POST["submit"])){
    $file = $_FILES['file']['tmp_name'];
    //echo 'upload file name: '.$file.' ';
    $handle = fopen($file, "r");
    $c = 0;
    $count =0;

    while(($filesop = fgetcsv($handle, 1000, ",")) !== false)
    {
        $count ++;
        $ID = $filesop[0];
        $Name = $filesop[1];
        $Contact = $filesop[2];
        $Email =$filesop[3];

        if($count>1){
        $sql = "INSERT INTO clients(id,name,contact,email,)VALUES($ID,'$Name',$Contact,'$Email',')";
        $resultsql = mysqli_query($link, $sql);
        //echo $resultsql; //how am i going to remove duplicates when if there is a duplicate record ?
George Stocker
  • 57,289
  • 29
  • 176
  • 237
lalaland
  • 9
  • 1
  • 9
  • 1
    For each row in the CSV you query the database to see if that data exists. If it does, it is a duplicate and you can skip inserting. – Jay Blanchard Jan 19 '16 at 14:08
  • I have usual tasks like this and I truncate the table before importing csv files but that only works if the data is always the same and only gets updated. – sinaza Jan 19 '16 at 14:10
  • I believe the best way will be to add an unique key on name,contact and email. That way if the record already exists it will not be inserted and you can use the mysql return value and error codes to count the duplicate records that were skipped – Dobromir Velev Jan 19 '16 at 14:12

2 Answers2

1

1) Before inserting, check existing data.

<?php

if(isset($_POST["submit"])){
    $file = $_FILES['file']['tmp_name'];
    //echo 'upload file name: '.$file.' ';
    $handle = fopen($file, "r");
    $c = 0;
    $count =0;

    while(($filesop = fgetcsv($handle, 1000, ",")) !== false)
    {
        $count ++;
        $ID = $filesop[0];
        $Name = $filesop[1];
        $Contact = $filesop[2];
        $Email =$filesop[3];

        $checkExistingData = "SELECT * FROM clients WHERE name='$Name' AND contact='$Contact' AND email='$Email'";
        $resultcheckExistingData = mysqli_query($link, $checkExistingData);
        $countExistingData = mysqli_num_rows($resultcheckExistingData);     

        if($countExistingData == 0)
        {
            if($count>1) {
                $sql = "INSERT INTO clients(id,name,contact,email,)VALUES($ID,'$Name',$Contact,'$Email',')";
                $resultsql = mysqli_query($link, $sql);
                //echo $resultsql; //how am i going to remove duplicates when if there is a duplicate record ?
        .
        .
        }
.
.
}?>

2) If data got inserted and you want to delete duplicate rows from table. You can try this too.

DELETE c1 FROM clients c1, clients c2 WHERE c1.name = c2.name AND c1.contact = c2.contact AND c1.email = c2.email AND c1.id > c2.id;
Nana Partykar
  • 10,556
  • 10
  • 48
  • 77
  • 1
    thank you , it is able to help me to check for the existing data, but what if u need to insert the files into the data and remove duplicates in the entire whole system ? – lalaland Jan 19 '16 at 15:14
  • Please see my 2nd point @lalaland . It will delete duplicate values. – Nana Partykar Jan 19 '16 at 17:10
  • do i put this query just right below this line ? if($count>1) { $sql = "INSERT INTO clients(id,name,contact,email,)VALUES($ID,'$Name',$Contact,'$Email',')"; $resultsql = mysqli_query($link, $sql); And what does the c1 and c2 stands for ? – lalaland Jan 20 '16 at 01:15
  • No. First one is for not allowing duplicate data to enter. Second one is for deleting duplicate data. You can keep 2nd query in any where and run. It will delete duplicate record from that particular table. I found this query for you. c1, c2 are alias name. Don't worry. It will work @lalaland – Nana Partykar Jan 20 '16 at 09:31
  • @lalaland : You didn't responded back. – Nana Partykar Jan 22 '16 at 17:45
  • hello, i tried the above SQL for delete , but it seems not to be working , and i kinda not fully understand the code . clients c1 , clients c2 , what does c1 and c2 means , if i have 100 records , do u have to do it for 100s times ??? sorry was the late reply :( – lalaland Jan 22 '16 at 23:40
  • I don't understand how ur query didn't work. It works. Just use this query once in mysql_query. Run it. All will get deleted. Simple @lalaland – Nana Partykar Jan 23 '16 at 00:25
  • http://stackoverflow.com/questions/34905102/how-to-delete-some-duplicated-rows/34905972#34905972 its working here. How it is possible that this codd will not work. U don't have to do any modifications. Just run it @lalaland – Nana Partykar Jan 23 '16 at 00:26
0

Add unique key to one or more of your columns, column with unique key will store unique values only, duplicate values will not be inserted.

carol
  • 311
  • 2
  • 4
  • 14