Does the performance of LOAD DATA LOCAL INFILE will come down when we use UNIQUE Key Column as to avoid the duplicates?
When i test a .csv file import with Unique and without Unique Key columns, I have following observations:
- I see 1 Million records stores/inserts into the table without having any issue without using the Unique key constraint, So it takes duplicates.
- With Unique Key constraint on a column to avoid duplicates, I tested with 70k records, it stored well in a minutes time where as for half million records are not inserting/storing into the table and it is running hours of hours through web or MySQL Workbench. Some times i see following issue:
Error Code: 1205. Lock wait timeout exceeded; try restarting transaction Query Used:
LOAD DATA LOCAL INFILE 'file.csv' ignore
INTO TABLE table1
CHARACTER SET UTF8mb4 FIELDS
TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n';`
Table structure:
CREATE TABLE
table1
(Id
int(11) NOT NULL AUTO_INCREMENT,
cUrlvarchar(255) DEFAULT NULL,
SUrlvarchar(255) DEFAULT NULL,
FirstNamevarchar(100) DEFAULT NULL,
LastNamevarchar(100) DEFAULT NULL,
EmailIdvarchar(150) NOT NULL,
Filtervarchar(55) DEFAULT NULL,
Designationvarchar(255) DEFAULT NULL,
Companyvarchar(255) DEFAULT NULL,
WebSitevarchar(255) DEFAULT NULL,
sizevarchar(25) DEFAULT NULL,
Industryvarchar(255) DEFAULT NULL,
Typevarchar(25) DEFAULT NULL,
Foundedvarchar(5) DEFAULT NULL,
datevarchar(55) NOT NULL, PRIMARY KEY (
Id), UNIQUE KEY
EmailId(
EmailId`) ) ENGINE=InnoDB AUTO_INCREMENT=16812 DEFAULT CHARSET=UTF8mb4;
Applied following steps too:
set names utf8mb4;
set unique_checks = 0;
set foreign_key_checks = 0;
set sql_log_bin=0;
mysqld select version();
+------------+
| version() |
+------------+
| 5.7.18-log |
+------------+
1 row in set (0.00 sec)
SELECT @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
| 8388608 |
+---------------------------+
1 row in set (0.00 sec)
I fixed the UTF8mb4 issue but with Unique key constraint i am failing to import the csv file of half million in the test. I have to work with 40 million records but ofcourse in chucks i will do.
Is it not efficient to use LOAD DATA LOCAL INFILE to import a large csv and with Unique Key constraint.
Suggestions and guidance will be appreciable.