I have a csv file around 20gig in size with about 60m rows, that I would like to load into a table within mysql.
I have defined my table in mysql with a composite primary key of (col_a, col_b)
prior to starting any load.
I have initiated my load as below:
LOAD DATA LOCAL INFILE '/mypath/mycsv.csv'
INTO TABLE mytable
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 0 LINES
(@v1v, @v2v, @v3v, @v4v, @v5v, etc...)
SET
col_a = nullif(@v1v,''),
col_b = nullif(@v2v,''),
col_c = nullif(@v3v,''),
col_d = nullif(@v4v,''),
col_e = nullif(@v5v,''),
etc...,
load_dttm = NOW();
This seemed to work fine, until the the dataset got to around 10g in size, at which point the loading significantly slowed, and what looked like it might take an hour has been running all night and not got much larger.
Are there more efficient ways of loading (depending on your definition of this word) "large" csv's into mysql.
My immediate thoughts are:
1) Should I remove my composite primary key, and only apply it after the load 2) Should I break down the csv into smaller chunks
As I understand it mysql is mainly limited by system constraints, which should not be an issue in my case - I am using an Linux Red-hat server with "MemTotal: 396779348 kB"! and terabytes of space.
This is my first time of using mysql, so please bear this in mind in any answers.