3

I have the following database relationship: enter image description here

I also have this large CSV file that I want to insert into bmt_transcripts:

Ensembl Gene ID Ensembl Transcript ID
ENSG00000261657 ENST00000566782
ENSG00000261657 ENST00000562780
ENSG00000261657 ENST00000569579
ENSG00000261657 ENST00000568242

The problem is that can't insert the Ensemble Gene ID as a string, I need to find its ID from the bmt_genes table, so I came up with this code:

LOAD DATA INFILE 'filename.csv'
INTO TABLE `bmt_transcripts`
(@gene_ensembl, ensembl_id)
    SET gene_id = (SELECT id FROM bmt_genes WHERE ensembl_id = @gene_ensembl);

However this takes over 30 minutes to load a 7mb CSV, which is far too long. I assume it's running a table-wide query for every row it inserts, which is obviously horribly inefficient. I know I could load the data into a temporary table and SELECT from that (which, yes, runs in some 5 seconds), but this CSV may grow to have some 20 columns, which will become unwieldy to write a select statement for.

How can I fix my LOAD DATA INFILE query (which runs a SELECT on another table) to run in a reasonable length of time?

Migwell
  • 18,631
  • 21
  • 91
  • 160
  • 2
    You can [import the CSV file](http://stackoverflow.com/questions/3635166/how-to-import-csv-file-to-mysql-table) to a temporary table, and then run an efficient SQL query to get the data in place. – sampathsris Jul 14 '14 at 05:58
  • I mentioned that in the question. The CSV might become more than 20 columns. I'm not sure writing an INSERT INTO where I write out all 20 columns twice is a good idea – Migwell Jul 14 '14 at 06:00
  • I have the same issue; for me it looks like MySql bug, because if I create `PHP` or `Python` script that does the same (parses file, runs subquery as separate query, and runs `INSERT`) works thousand times faster. – Vitalii Oct 13 '17 at 12:17

0 Answers0