1

I have one csv file which has n number of records. Now I want to import it into table. I also want to ignore duplicate records(Duplicate wil considred if specific two column has the same value). I am also passing the headers into query value so that reoced will be inserted into appropriate column of database table

My query is $header is csv header $basePath is base path location of csv

  $query = sprintf("LOAD DATA local INFILE '%s' INTO TABLE table name FIELDS TERMINATED BY '`' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"' LINES TERMINATED BY '\\n'   (duplicatematchcol1,duplicatematchcol1) IGNORE 0 LINES (" . implode(',', $header) . ") , addslashes($basePath));
Akhilesh Jha
  • 168
  • 2
  • 11
  • **WARNING**: When using `mysqli` you should be using [parameterized queries](http://php.net/manual/en/mysqli.quickstart.prepared-statements.php) and [`bind_param`](http://php.net/manual/en/mysqli-stmt.bind-param.php) to add user data to your query. **DO NOT** use string interpolation or concatenation to accomplish this because you have created a severe [SQL injection bug](http://bobby-tables.com/). **NEVER** put `$_POST`, `$_GET` or **any** user data directly into a query, it can be very harmful if someone seeks to exploit your mistake. `addslashes` is **NOT** a SQL escaping method. – tadman Oct 29 '18 at 16:56
  • @tadman I am using it inside laravel. – Akhilesh Jha Oct 29 '18 at 17:00
  • Laravel has a [number of methods for safely introducing data using placeholder values](https://laravel.com/docs/5.7/queries#raw-expressions). Use that approach whenever possible and avoid using quick hacks like `sprintf` which aren't adequate protection against injection issues. – tadman Oct 29 '18 at 17:04
  • @tadman. I will definitely use your input in query. But could you please help me to achieve my output from mysql query – Akhilesh Jha Oct 29 '18 at 17:07
  • Probably a duplicate: https://stackoverflow.com/q/12891337/1531971 (If not, show and tell us why not.) –  Oct 29 '18 at 18:27
  • @jdv . My question is not same because I cannot create unique index for our coulmm. One of the column value may be empty. So if two record has same email and mobile field is empty and if My csv has same email then that record will be inserted. Only when both column has value and if I am importing same value from csv then only record will ignore – Akhilesh Jha Oct 30 '18 at 06:43
  • @AkhileshJha you should tell us your research _in the text of the question_. You can [edit] the question at any time to add clarification. –  Oct 30 '18 at 13:24

1 Answers1

0

If you have a UNIQUE index on the column(s) that define duplication you can always just use the INSERT IGNORE INTO ... approach where insertion errors of that sort are ignored.

The IGNORE flag should be usable even within a LOAD DATA query.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Actually record will be considered as duplicate if two column has same value. Like for e.g. if email and mobile both field has same value then record will be considered as duplicate. If one of them are duplicate then that record will be considered as unique – Akhilesh Jha Oct 29 '18 at 17:10
  • That's something you'll need to import and then delete later in a second pass, like `DELETE FROM table_name WHERE email=mobile` or something like that. Technically that's not a duplicate record, but a duplicated field. – tadman Oct 29 '18 at 17:11
  • In my case user is importing millions of record at at time. so if I will run insert and duplicate query separately then my database performance will effect. I want to achieve it into single query. – Akhilesh Jha Oct 29 '18 at 17:16
  • 1
    Remember that what you want and what's possible are two different things. SQL is a fairly basic command set and it's not intended to do anything heavy like this. You can automate this with a [`BEFORE INSERT` trigger](https://dev.mysql.com/doc/refman/8.0/en/trigger-syntax.html) but you can't do it as one command. Loading data in from a CSV file is usually fast even for large numbers of rows, though, so do benchmark this with real-world conditions before panicking. – tadman Oct 29 '18 at 18:00