Update*: After looking at my Python script I found the preparing and calculating the data before inserting the records is where the biggest processing time takes place... not the DB Insert. The inserts actually are relatively quick (about 30 seconds for 10,000 records with a table size of 1,000,000 records). I do however think the feedback I got on this question will help me optimize the inserts also. Thanks to all who replied. Now I'm off to try to optimize my Python script to speed it up.
I currently have a MariaDB table with the following structure:
CREATE TABLE IF NOT EXISTS `adeck_errors` (
`StormID` varchar(8) NOT NULL DEFAULT '1',
`ModelBaseTime` datetime NOT NULL,
`Model` varchar(4) NOT NULL,
`Tau` smallint(4) NOT NULL,
`LatCARQ` float DEFAULT NULL,
`LonCARQ` float DEFAULT NULL,
`LatModel` float DEFAULT NULL,
`LonModel` float DEFAULT NULL,
`DistError` smallint(6) DEFAULT NULL,
`WindCARQ` int(11) DEFAULT NULL,
`WindModel` int(11) DEFAULT NULL,
`WindError` smallint(6) DEFAULT NULL,
`PresCARQ` int(11) DEFAULT NULL,
`PresModel` int(11) DEFAULT NULL,
`PresError` smallint(6) DEFAULT NULL,
UNIQUE KEY `StormID` (`StormID`,`ModelBaseTime`,`Model`,`Tau`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
As you can see my unique key is across 4 fields in the table to ensure I do not get any duplicate entries. When I'm inserting 10,000 + records into a table that already contains 1 million records, this takes about 15-20 minutes and I would like to speed this process up.
My question is would it be better to create a hash of the data in the 4 fields and use that hash as a primary key so I don't have duplicates? When I'm processing the data before inserting (with Python), I can quickly create a hash value of the 4 pieces of data before I move into my DB Insert function.
Thanks for your suggestions. Bryan