I have the following database relationship:
I also have this large CSV file that I want to insert into bmt_transcripts
:
Ensembl Gene ID Ensembl Transcript ID
ENSG00000261657 ENST00000566782
ENSG00000261657 ENST00000562780
ENSG00000261657 ENST00000569579
ENSG00000261657 ENST00000568242
The problem is that can't insert the Ensemble Gene ID as a string, I need to find its ID from the bmt_genes
table, so I came up with this code:
LOAD DATA INFILE 'filename.csv'
INTO TABLE `bmt_transcripts`
(@gene_ensembl, ensembl_id)
SET gene_id = (SELECT id FROM bmt_genes WHERE ensembl_id = @gene_ensembl);
However this takes over 30 minutes to load a 7mb CSV, which is far too long. I assume it's running a table-wide query for every row it inserts, which is obviously horribly inefficient. I know I could load the data into a temporary table and SELECT from that (which, yes, runs in some 5 seconds), but this CSV may grow to have some 20 columns, which will become unwieldy to write a select statement for.
How can I fix my LOAD DATA INFILE query (which runs a SELECT on another table) to run in a reasonable length of time?