1

I am trying to update mysql database using perl. I tried different approaches, and all of them are super slow. It takes 10-15 mins to update 14k records. I thought the best way would be creating stored procedure and call it from perl file, but still same response time.

elsif($update eq "incrementalLoad") {
    $filename = param("customPricing");
    $handle = upload("customPricing");
    while (<$handle>)  {                
        $currentRecord++;
        @currentLine = split( /,/, $_ );
        $i = 0;
        foreach $l(@currentLine){
            $currentLine[$i] =~ s/\\r//g; 
            $i++;           
        }       
        $query = "CALL upsertIncremental('$currentLine[0]', '$currentLine[1]', '$currentLine[2]', '$currentLine[3]', '$currentLine[4]', '$currentLine[5]', '$currentLine[6]', '$currentLine[7]', '$currentLine[8]', '$currentLine[9]', '$currentLine[10]', '$currentLine[11]', '$currentLine[12]', '$currentLine[13]', '$currentLine[14]', '$currentLine[15]', '$currentLine[16]', '$currentLine[17]', '$currentLine[18]', '$currentLine[19]', '$currentLine[20]', '$currentLine[21]', '$currentLine[22]', '$currentLine[23]', '$currentLine[24]', '$currentLine[25]')";
        $sth = $dbh->do($query) or die "Afasdf";
    }   
    print $currentRecord . " Record(s) uploaded.<br/>";
    $dbh->disconnect;           
}

what can I do to improve performance ?

Is this better ?

elsif($update eq "incrementalLoad") {
    $filename = param("customPricing");
    $handle = upload("customPricing");

    my $update_handle = $dbh->prepare_cached("UPDATE custompricingtest SET partNumberSKU= ?, customerClass= ?, customerName= ?, customerId= ?, customerNumber= ?, custPartNumber=?, svcType= ?, sppl= ? , svcDuration= ?, durationPeriod= ?, priceMSRP= ?, partnerPriceDistiDvarOEM= ?, msrpSvcPrice=?, partnerSvcPrice=?, msrpBundlePrice=?, partnerBundlePrice=?, startDate=?, endDate=?, currency=?, countryCode=?, inventoryItemId=?, flexField1=?, flexField2=?, flexField3=?, flexField4=?, flexField5=? WHERE partNumberSKU=? and ifnull(customerClass,0)=ifnull(?,0) and ifnull(customerName,0)=ifnull(?,0) and ifnull(svcType,0)=ifnull(?,0) and ifnull(svcDuration,0)=ifnull(?,0) and ifnull(durationPeriod,0)=ifnull(?,0)") or $error = 1;

    while (<$handle>)  {                
        $currentRecord++;
        @currentLine = split( /,/, $_ );
        $i = 0;
        foreach $l(@currentLine){
            $currentLine[$i] =~ s/\\r//g; 
            $i++;           
        }       
        $update_handle->execute($currentLine[0],$currentLine[1],$currentLine[2],$currentLine[3],$currentLine[4],$currentLine[5],$currentLine[6],$currentLine[7],$currentLine[8],$currentLine[9],$currentLine[10],$currentLine[11],$currentLine[12],$currentLine[13],$currentLine[14],$currentLine[15],$currentLine[16],$currentLine[17],$currentLine[18],$currentLine[19],$currentLine[20],$currentLine[21],$currentLine[22],$currentLine[23],$currentLine[24],$currentLine[25],$currentLine[0],$currentLine[1],$currentLine[2],$currentLine[6],$currentLine[8],$currentLine[9]) or die "can't execute UPDATE  query. \n";
        print $currentRecord . "<br/>";
    }   
    print $currentRecord . " Record(s) uploaded.<br/>";
    $dbh->disconnect;           
}

Table format

  CREATE TABLE `custompricingtest` (
  `partNumberSKU` varchar(255) DEFAULT NULL,
  `customerClass` varchar(255) DEFAULT NULL,
  `customerName` varchar(255) DEFAULT NULL,
  `customerId` varchar(255) DEFAULT NULL,
  `customerNumber` varchar(255) DEFAULT NULL,
  `custPartNumber` varchar(255) DEFAULT NULL,
  `svcType` varchar(255) DEFAULT NULL,
  `sppl` varchar(255) DEFAULT NULL,
  `svcDuration` varchar(255) DEFAULT NULL,
  `durationPeriod` varchar(255) DEFAULT NULL,
  `priceMSRP` varchar(255) DEFAULT NULL,
  `partnerPriceDistiDvarOEM` varchar(255) DEFAULT NULL,
  `msrpSvcPrice` varchar(255) DEFAULT NULL,
  `partnerSvcPrice` varchar(255) DEFAULT NULL,
  `msrpBundlePrice` varchar(255) DEFAULT NULL,
  `partnerBundlePrice` varchar(255) DEFAULT NULL,
  `startDate` varchar(255) DEFAULT NULL,
  `endDate` varchar(255) DEFAULT NULL,
  `currency` varchar(255) DEFAULT NULL,
  `countryCode` varchar(255) DEFAULT NULL,
  `inventoryItemId` varchar(255) DEFAULT NULL,
  `flexField1` varchar(255) DEFAULT NULL,
  `flexField2` varchar(255) DEFAULT NULL,
  `flexField3` varchar(255) DEFAULT NULL,
  `flexField4` varchar(255) DEFAULT NULL,
  `flexField5` varchar(255) DEFAULT NULL,
  KEY `part_num_sku` (`partNumberSKU`),
  KEY `svcType` (`svcType`),
  KEY `svcDuration` (`svcDuration`),
  KEY `durationPeriod` (`durationPeriod`),
  KEY `customerClass` (`customerClass`)
) 

I end up saving the file in a temp table (takes couple of seconds) and then doing an updating join of the tables but still its really slow. Below is the update query

UPDATE custompricingtest t1, custompricingtesttemp t2 
SET t1.customerId         = t2.customerId, 
    t1.customerNumber     = t2.customerNumber, 
    t1.custPartNumber     = t2.custPartNumber, 
    t1.sppl               = t2.sppl , 
    t1.priceMSRP          = t2.priceMSRP, 
    t1.partnerPriceDistiDvarOEM = t2.partnerPriceDistiDvarOEM,
    t1.msrpSvcPrice       = t2.msrpSvcPrice, 
    t1.partnerSvcPrice    = t2.partnerSvcPrice, 
    t1.msrpBundlePrice    = t2.msrpBundlePrice,
    t1.partnerBundlePrice = t2.partnerBundlePrice, 
    t1.startDate          = t2.startDate, 
    t1.endDate            = t2.endDate, 
    t1.currency           = t2.currency, 
    t1.countryCode        = t2.countryCode, 
    t1.inventoryItemId    = t2.inventoryItemId, 
    t1.flexField1         = t2.flexField1, 
    t1.flexField2         = t2.flexField2, 
    t1.flexField3         = t2.flexField3, 
    t1.flexField4         = t2.flexField4, 
    t1.flexField5         = t2.flexField5
WHERE t1.partNumberSKU  = t2.partNumberSKU 
  and t1.customerClass  = t2.customerClass 
  and t1.customerName   = t2.customerName 
  and t1.svcType        = t2.svcType 
  and t1.svcDuration    = t2.svcDuration 
  and t1.durationPeriod = t2.durationPeriod

Explain extended resullt

id  select_type     table   type    possible_keys                                                               key               key_len   ref                            rows     Extra
1   SIMPLE           t2     ALL     part_num_sku,customerName,customerClass,durationPeriod,svcDuration,svcType  NULL                NULL    NULL                          28758      
1   SIMPLE           t1     ref     part_num_sku,svcDuration,customerName,customerClass,durationPeriod,svcType  part_num_sku         13     testbrocade.t2.partNumberSKU    394     Using where

Nitesh

DavidO
  • 13,812
  • 3
  • 38
  • 66
Nitesh
  • 575
  • 2
  • 10
  • 27
  • 1
    Disable indexes for the duration of the import then re-enable/update indexes afterwards. – Marc B Aug 04 '11 at 16:27
  • I tried that..still no use. Is the code I posted in the end better then others ? – Nitesh Aug 04 '11 at 17:00
  • the prepared statement will cut down on sql parseing/compiling overhead, but most likely the bottleneck is somewhere else. – Marc B Aug 04 '11 at 17:09
  • thats what I am trying to figure out from 3 days...Is it because of my complicated update statement ? I am setting around 25 columns and checking against 5 columns with if(null) conditions – Nitesh Aug 04 '11 at 17:13
  • 1
    possibly - since you're comparing derived values in the where clause, use of indexes could be precluded. try running one of those queries by hand in the mysql monitor and "explain" it, to see which indexes are being used. if indexes on those ifnull()'d fields can't be used, then you'd be doing a bunch of whole table scans for each row. – Marc B Aug 04 '11 at 17:15
  • Long indexes in MySQL are never that good. It is often a good plan to use a prefix index if the first few characters are good enough to distinguish. This reduces the amount of comparison. I'd be surprised if you need more than 10 characters of index to distinguish an SKU almost uniquely, so try `KEY part_num_sku(partNumberSKU(10))`. We do this using the temporary table and update, and can process a million records in less time than you're handling 14k. I usually debug this by rewriting the update as a select and doing EXPLAIN EXTENDED to check the indexes, as @Marc B suggests – Stuart Watt Aug 04 '11 at 19:53
  • I tried using temp table, prefix Index but still around 5-10 mins to update 10k records. so I upload the file in a temp table and run the code mentioned above to update my table from temp table. – Nitesh Aug 04 '11 at 21:15
  • one more thing...index is defined to accept the complete value (varchar(255)) but its pretty much around 1--15 characters all the time..Does it still matter to define it only for first 10 ? – Nitesh Aug 04 '11 at 21:30
  • I just added the explain extended details above...I am not sure how to read it – Nitesh Aug 05 '11 at 21:03

5 Answers5

1

Don't use a stored procedure; do multiple rows in the same statement. Depending on your max_allowed_packet, you should be able to do all your rows in one or a just a few insert statements.

Oops, sorry, somehow I misread update as insert. Updating is harder; I'll try to show an example if you will show your create table.

ysth
  • 96,171
  • 6
  • 121
  • 214
  • thanks for the reply...can you give me bit more details on how to do multiple rows in the same statement ? – Nitesh Aug 04 '11 at 16:32
  • How about using LOAD DATA INFILE WITH REPLACE function ? Can't I achieve same functionality with super speed ? – Nitesh Aug 04 '11 at 16:35
  • I added improved code in the end..is that what you meant ? or there is room for further optimization ? – Nitesh Aug 04 '11 at 17:14
  • for 14k rows, I don't know that I'd bother with LOAD DATA INFILE. can you add output of `show create table customerpricingtest` to your question? – ysth Aug 04 '11 at 17:14
  • I just did...later I realized that I can't use LOAD DATA INFILE because condition for update is based on non-unique columns. 14k is taking for ever to get updated, may be because of my complicated update query – Nitesh Aug 04 '11 at 17:19
  • Sorry, I'm going to be busy for several hours; the basic technique is insert your criteria and updated values into a temporary table in as few statements as possible, then do a single update joining your table to the temporary table; will try to show the sql as soon as I have time – ysth Aug 04 '11 at 17:28
  • Thanks for your help..I will look in to that...whenever you have time, please share the sql...appreciate your help – Nitesh Aug 04 '11 at 17:32
  • I added the code above, can you take a look and let me know if there is anything wrong ? – Nitesh Aug 04 '11 at 18:21
  • It doesn't do what your original update did; not at all what I had in mind. I will try to find time but am still pretty busy. – ysth Aug 04 '11 at 19:22
  • @ysth let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/2164/discussion-between-nitesh-and-ysth) – Nitesh Aug 05 '11 at 15:20
1

start a transaction, do all the inserts and at last commit. This frees mySql to manage a single transaction for each insert.

Saic Siquot
  • 6,513
  • 5
  • 34
  • 56
  • did you try this approach? even in the bounty anwser, I do recommend to do all inserts to the temp table in a single transaction which boosts preformance – Saic Siquot Jun 19 '12 at 13:28
0

I've done a script like this before. You could do it with bookends.

  1. The initialize step is to create a temp table--guaranteed to be out of contention, and also no indexes are needed.
  2. The per record action is either batch inserts or a proc,
  3. and then the final step/proc is to commit the changes from the temp table to the main table in a transaction (and dispose of the temp table).

For this last step, you might even need to drop and create indexes to speed that up.

Axeman
  • 29,660
  • 2
  • 47
  • 102
  • thanks for the reply...you mind sharing your script ? I am kinda newbie in this .. – Nitesh Aug 04 '11 at 17:31
  • I added update joining of two tables like you said..its still taking forever..I added the code above in description area...am I doing anything wrong? – Nitesh Aug 04 '11 at 18:21
  • You might want to "chunk" it by setting rowcount. Update and delete, update and delete. There was once or twice on a big system with a lot of data, we couldn't run an update script except on Saturday or something. Sometimes the runtime is just the runtime. – Axeman Aug 04 '11 at 18:58
0

I don't know what your data looks like, but if you have an indexed column that is purely numeric (partNumberSKU?) you might consider changing the column type to int. That should make key comparisons faster. If you're using your temp table approach, you'll want to make the same change in each table (and make sure that both are indexed).

If you're playing around with which columns to index, aim for those that have a large number of unique values.

  • If you want to try a more dramatic solution, you could convert all your varchar(255) columns to char(255) (or whatever size is actually necessary). You'd use much more storage space, and might lose some performance due to the increased size of the database, but my understanding is that having fixed-length rows could speed updates significantly. This probably isn't an ideal solution, and it would only help if you're using MyISAM tables. See [this post](http://stackoverflow.com/questions/147315/mysql-row-format-difference-between-fixed-and-dynamic) for more details. – Colin Anderson Aug 09 '11 at 18:03
0

The main problem I think is that MySQL can use only 1 index per table, per query. You have separately indexed your columns:

KEY `part_num_sku` (`partNumberSKU`),
KEY `svcType` (`svcType`),
KEY `svcDuration` (`svcDuration`),
KEY `durationPeriod` (`durationPeriod`),
KEY `customerClass` (`customerClass`)

But it will only be able to use one of these keys (the optimiser tries to work out which one). Maybe you need a composite key, comprising of the columns that you need? The index will be large, but it may help. Or it may be so large it does not.

Remember: in a MySQL composite index, the columns in the index have to be used in the query in left to right order or it will not be used. This may help: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

cubabit
  • 2,527
  • 3
  • 21
  • 34