2

I currently have code for perl that looks like this:

@valid = grep { defined($column_mapping{ $headers[$_] }) } 0 .. $#headers;

...

my $sql = sprintf 'INSERT INTO tablename ( %s ) VALUES ( %s )',
  join( ',', map { $column_mapping{$_} } @headers[@valid] ),
  join( ',', ('?') x scalar @valid);
my $sth = $dbh->prepare($sql);

...

my @row = split /,/, <INPUT>; 
$sth->execute( @row[@valid] );

(Taken from mob's answer to a previous question.)

That is basically dynamically building a sql insert statement from csv data, and only allowing the csv data with proper headers from my column mapping to be picked.

I have been looking for examples on how to do an insert statment with multiple rows of data at once.

My perl script needs to run around a few hundred million insert statments, and doing it one at a time seems really slow, especially since the server I am running it on only has 6gb of ram and a slowish internet connection.

Is there a way I can upload more than 1 row at a time of data? So one insert statment uploads maybe 50 rows, or 100 rows at once? I cant find out how with perl DBI.

Community
  • 1
  • 1
Lain
  • 2,166
  • 4
  • 23
  • 47
  • possible duplicate of [Perl DBI insert multiple rows using mysql native multiple insert ability](http://stackoverflow.com/questions/8421711/perl-dbi-insert-multiple-rows-using-mysql-native-multiple-insert-ability) – ThisSuitIsBlackNot Aug 19 '15 at 17:29
  • @ThisSuitIsBlackNot That was the only one I found on it, and the guy suggests doing it one at a time, and most other comments agree, which is NOT what I want. He says he would not use that method to do multiple insert statements, most answers say to do one at a time. I was wondering if there was an answer that was not that – Lain Aug 19 '15 at 17:31
  • 2
    Read the first approach in the accepted answer, beginning with "You can insert `(?, ?, ?)` a number of times based on the size of the array." That shows how to generate a single insert statement for multiple rows. – ThisSuitIsBlackNot Aug 19 '15 at 17:33
  • Unrelated to the duplicate: a common performance issue in DBI scripts is calling `prepare` inside a loop; you haven't shown all of your code, so I don't know if this is the case for you, but you should make sure you're only calling `prepare` once for a given query. – ThisSuitIsBlackNot Aug 19 '15 at 17:40
  • 4
    Also, if you're inserting a very large number of rows, [`LOAD DATA INFILE`](https://dev.mysql.com/doc/refman/5.7/en/load-data.html) is likely to be faster than a series of compound inserts. To do this, parse your raw data as you're doing now and write it out to a CSV file, then load it with `LOAD DATA INFILE`. At which point this approach becomes faster than compound inserts depends on your application and your database setup, but you can get a significant performance boost this way. – ThisSuitIsBlackNot Aug 19 '15 at 17:46

2 Answers2

2
my $sql_values = join( ' ', ('(?, ?, ?)') x scalar(@array) );

As said before, then you can just flatten it.

Alex Smith
  • 36
  • 3
0

You can insert multiple rows at once with the same syntax as in regular SQL, but you need to build your INSERT statemtent properly with Perl. Perl's slice() may help you:

Suppose you have 7 rows of data and want to insert them in chunks of 3 rows. "Regular" SQL would be like so:

insert into T (col1, col2) values ( 1, 2), ( 3, 4), ( 5, 6);
insert into T (col1, col2) values ( 7, 8), ( 9,10), (11,12);
insert into T (col1, col2) values (13,14);

Let's suppose your perl structure is like this:

my $values = [ [1,2], [3,4], ..., [13,14] ];

If it's not, bring it into this shape. Now:

use constant CHUNKSIZE => 3;

my $stmt = sprintf( 'insert into T (col1, col2) values %s', 
                    join(',', '(?,?)' x CHUNKSIZE) );
# $stmt is now 'insert into T (col1, col2) values (?,?),(?,?),(?,?)'

my $sth = $dbh->prepare($stmt);

while( my @chunk = splice( @{$values}, 0, CHUNKSIZE ) ) {
    # @chunk has 3 elements (rows), or less for the last chunk
    if (scalar @chunk == CHUNKSIZE) {
        $sth->execute( @chunk ); # inserts 3 rows at once
    } else {
        # build and prepare a new statement for the remaining rows.
        # in our sample there is only 1 remaining row.
        $stmt = sprintf( 'insert into T (col1, col2) values %s',
                         join(',', '(?,?)' x scalar @chunk) );
        $sth = $dbh->prepare($stmt);
        $sth->execute( @chunk ); # inserts the last row
    }
}
PerlDuck
  • 5,610
  • 3
  • 20
  • 39