Do I need to pre-process the text file before using LOAD DATA INFILE? And if so, what would be the best way to pre-process?
You must do it during the importing. Something like:
LOAD DATA INFILE 'c:/path to file/file.txt'
INTO TABLE table_name
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
-- the fields which values are set directly,
-- and intermediate variables for values which must be processed,
-- positioned according to CSV structure
(field1, field2, @variable3, field4, ...)
-- process the values in the variables and set fields values
SET field3 = CASE WHEN @variable3 = '' THEN 0 ELSE @variable3 END;
(field1, field2, @variable3, field4, ...)
is the destination of data fields parsed from each line of source CSV file.
I.e. first parsed value from the source line which is currently processed will be assigned directly to the field field1
of destination table. The same with second value and field2
.
The third value parsed will be assigned to user-defined local variable @variable3
.
The 4th parsed value again will be assigned to the table field. And so on if more data and code is present.
After the whole line parsed due to specification explained above the next processing directive is executed: SET field3 = CASE WHEN @variable3 = '' THEN 0 ELSE @variable3 END
.
It is simple. If a value of variable @variable3
was assigned to empty string, then the value 0
is assigned to the field field3
of the record currently parsed, else the value parsed from current line of source file is assigned to this field without modification.
After both lines processed the whole record (all fields which were assigned to some value to) are stored into one new record in destination table by common way (assigning defaults to non-listed fields, checks, triggers...).
After storing the record the next line from CSV is readed, parsed, processed, stored, then the next line ... and so on, until the end of file or some error.