I need to test out some queries on some large datasets (1M - 50M)
for example and I'm having trouble writing a process that will do this fast.
I have the below code, which is slightly modified from a version of it I found on another SO post:
set_time_limit(0);
$db_host = 'localhost';
$db_name = 'test';
$db_user = '';
$db_pass = '';
$entries = 1000000;
$entry_words_min = 250;
$entry_words_max = 1000;
/*
End configuration
*/
function get_rand_word( $len_min, $len_max ) {
for ( $i = 0; $i < ( rand( 0, $len_max - $len_min ) + $len_min ); $i++ ) {
$word .= chr(rand(65, 90));
}
return $word;
}
function get_title() {
for ( $i = 0; $i < ( rand( 4, 10 ) ); $i++ ) {
$title .= get_rand_word( 2, 9 ) . ' ';
}
return $title;
}
function get_fulltext() {
for ( $i = 0; $i < ( rand( 250, 500 ) ); $i++ ) {
$fulltext .= get_rand_word( 2, 9 ) . ' ';
}
return $fulltext;
}
$dsn = 'mysql:dbname=' . $db_name . ';host=' . $db_host;
try {
$dbh = new PDO($dsn, $db_user, $db_pass);
} catch (PDOException $e) {
echo 'Connection failed: ' . $e->getMessage();
die();
}
$sth = $dbh->prepare('INSERT INTO `sphinx` (`some_id`,`title`,`content`) VALUES (:some_id, :title, :content)');
$counter = 0;
for ( $i = 0; $i < $entries; $i++ ) {
$sth->execute(array(
':some_id' => mt_rand(1,65000),
':title' => get_title(),
':content' => get_fulltext()
));
$counter++;
}
echo $counter . ' rows inserted';
However this is rather slow at inserting; it has taken several hours just to insert 500k
rows. I know I could write a batch insert query, but I don't think it's such a good idea to write one for a million inserts; I guess it could be broken down into 10k
at a time or something, but wanted to check if there was any better/cleaner methods available.