I need to transfer data (the columns I need are n.inchi,n.info,n.accessions,n.chebiid
from one table to another, both tables have over 20,000 rows (heroku_chemical_chemical
has 50,000, thats the table I'm transfering data to).
I tried this query:
UPDATE heroku_chemical_chemical AS h, new_compounds_filtered AS n
SET
h.inchi_identifier=n.inchi,
h.info=n.info,
h.accessions=n.accessions,
h.chebi_id=n.chebiid
WHERE h.name = n.name
AND (h.inchi_identifier = '' OR h.inchi_identifier IS NULL);
So basically, the chemical name that appears in table 1 (heroku_chemical_chemical
) might be present in table 2, and if it is I need to get the data from table 2 for that chemical. If h.inchi_identifier
has data in it, then I know that chemical is done. Problem is that query takes incredibly long to execute, I left it running overnight, but next morning it still hadn't completed so I had to cancel it. Since its a transaction, nothing was transferred. If it instead was to execute the transfer bit by bit then it would work.
I can add this to the query:
AND n.id BETWEEN 1 AND 500
to limit the query to 500 rows (from the second table where I'm transfering the data from) do it in small pieces, but then I manually have to keep rerunning the query with different values for the BETWEEN
clause. Its also really slow. I'd prefer to do this in pure SQL rather than setting up a PHP script.
Is there a way to do this where it inserts data as it goes, rather than waiting until the whole transaction is complete? Also, is there a way to do this more rapidly?
Heres the table definitions:
heroku_chemical_chemical
CREATE TABLE `heroku_chemical_chemical` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` text COLLATE utf8mb4_unicode_ci,
`synonyms` text COLLATE utf8mb4_unicode_ci,
`associated_from` text COLLATE utf8mb4_unicode_ci,
`category_associated_from` text COLLATE utf8mb4_unicode_ci,
`chemical_number` text COLLATE utf8mb4_unicode_ci,
`parent_chemical_numbers` text COLLATE utf8mb4_unicode_ci,
`category_id` text COLLATE utf8mb4_unicode_ci,
`slug` text COLLATE utf8mb4_unicode_ci,
`cas_rn` text COLLATE utf8mb4_unicode_ci,
`definition` text COLLATE utf8mb4_unicode_ci,
`drug_bank_ids` text COLLATE utf8mb4_unicode_ci,
`foodb_id` text COLLATE utf8mb4_unicode_ci,
`itis_id` text COLLATE utf8mb4_unicode_ci,
`name_scientific` text COLLATE utf8mb4_unicode_ci,
`picture_content_type` text COLLATE utf8mb4_unicode_ci,
`picture_file_name` text COLLATE utf8mb4_unicode_ci,
`picture_file_size` text COLLATE utf8mb4_unicode_ci,
`wikipedia_id` text COLLATE utf8mb4_unicode_ci,
`actor_id` text COLLATE utf8mb4_unicode_ci,
`bio_cyc_id` text COLLATE utf8mb4_unicode_ci,
`chebi_id` text COLLATE utf8mb4_unicode_ci,
`chem_spider_id` text COLLATE utf8mb4_unicode_ci,
`chembl_id` text COLLATE utf8mb4_unicode_ci,
`ctd_id` text COLLATE utf8mb4_unicode_ci,
`hmdb_id` text COLLATE utf8mb4_unicode_ci,
`inchi_identifier` text COLLATE utf8mb4_unicode_ci,
`inchi_key` text COLLATE utf8mb4_unicode_ci,
`kegg_compound_id` text COLLATE utf8mb4_unicode_ci,
`omim_id` text COLLATE utf8mb4_unicode_ci,
`pdb_id` text COLLATE utf8mb4_unicode_ci,
`pubchem_compound_id` text COLLATE utf8mb4_unicode_ci,
`stitch_di` text COLLATE utf8mb4_unicode_ci,
`t3db_id` text COLLATE utf8mb4_unicode_ci,
`uni_prot_id` text COLLATE utf8mb4_unicode_ci,
`iupac_name` text COLLATE utf8mb4_unicode_ci,
`formula` text COLLATE utf8mb4_unicode_ci,
`smiles` text COLLATE utf8mb4_unicode_ci,
`chemspider_id` text COLLATE utf8mb4_unicode_ci,
`molecular_weight` text COLLATE utf8mb4_unicode_ci,
`accessions` text COLLATE utf8mb4_unicode_ci,
`chebi_info` text COLLATE utf8mb4_unicode_ci,
`chebi_name` text COLLATE utf8mb4_unicode_ci,
`compound_type` text COLLATE utf8mb4_unicode_ci,
PRIMARY KEY (`id`)
)
ENGINE=InnoDB
AUTO_INCREMENT=379336
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_unicode_ci;
new_compounds_filtered
CREATE TABLE `new_compounds_filtered` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` text COLLATE utf8mb4_unicode_ci,
`chebiid` text COLLATE utf8mb4_unicode_ci,
`info` text COLLATE utf8mb4_unicode_ci,
`smiles` text COLLATE utf8mb4_unicode_ci,
`inchi` text COLLATE utf8mb4_unicode_ci,
`inchikey` text COLLATE utf8mb4_unicode_ci,
`parent_id` text COLLATE utf8mb4_unicode_ci,
`accessions` text COLLATE utf8mb4_unicode_ci,
`synonyms` text COLLATE utf8mb4_unicode_ci,
PRIMARY KEY (`id`)
)
ENGINE=InnoDB
AUTO_INCREMENT=85432
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_unicode_ci ;
UPDATE:
So I'm running a new query which needs to load the ID (which is indexed) into a column in a relationship table.
UPDATE chemical_organism_relations AS O2, heroku_chemical_chemical AS H1
SET O2.compound_id = H1.id, O2.substance_type = 'compound'
WHERE O2.foodb_compound_id = H1.foodb_id;
Again the same problem, the query doesn't seem to complete. I can't put a unique index on H1.foodb_id because only about 5% of the chemicals in the table have a FooDB ID. So I'm having the same problem. O2.compound_id is unindexed because all of its fields are blank right now, I can't index them unless I insert a temporary unique number into each of them.
Heres whats in the performance_schema.events_statements_current
table:
I've been looking for a way to track the queries progress as it runs. One thing I'm wondering is what these columns mean:
Is there a way to see whats going on behind the scenes? If I could see what the server is doing and how much progress it has made in running the query, then I would know if the query will ever complete, but right now I don't know whether its crashed or whats happening.
I just ran a really simple query:
mysql> UPDATE chemical_organism_relations SET substance_type = 'compound'; Query OK, 740672 rows affected (1 min 2.95 sec) Rows matched: 740672 Changed: 740672 Warnings: 0
74000, but a minute to process a simple update query involving no other tables.
Big Problem With My Query Someone commented to an answer before I merged it with the main thread, they said that because:
O2.compound_id is unindexed because all of its fields are blank
That I have a big problem with my query. I accidentally posted the wrong query, but I posted the right one now, is there still a big problem with the query?