3

I am testing performance in a MySQL Server and filling a table with more than 200 million of records. The Stored Procedure is very slow generating the big SQL string. Any help or comment is really welcome.

System Info:

  • Database: MySQL 5.6.10 InnoDB database (test).
  • Processor: AMD Phenom II 1090T X6 core, 3910Mhz each core.
  • RAM: 16GB DDR3 1600Mhz CL8.
  • HD: Windows 7 64bits SP1 in SSD, mySQL installed in SSD, logs written in mechanical hard disk.

The Stored Procedure creates a INSERT sql query with all the values to be inserted into the table.

DELIMITER $$
USE `test`$$

DROP PROCEDURE IF EXISTS `inputRowsNoRandom`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `inputRowsNoRandom`(IN NumRows BIGINT)
BEGIN
    /* BUILD INSERT SENTENCE WITH A LOS OF ROWS TO INSERT */
    DECLARE i BIGINT;
    DECLARE nMax BIGINT;
    DECLARE squery LONGTEXT;
    DECLARE svalues LONGTEXT;

    SET i = 1;
    SET nMax = NumRows + 1;
    SET squery = 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE) VALUES ';
    SET svalues = '("1", "a1", 100, 1, 500000, "2013-06-14 12:40:45"),';

    WHILE i < nMax DO
        SET squery = CONCAT(squery, svalues);
        SET i = i + 1;
    END WHILE;

    /*SELECT squery;*/
    SET squery = LEFT(squery, CHAR_LENGTH(squery) - 1);
    SET squery = CONCAT(squery, ";");
    SELECT squery;

    /* EXECUTE INSERT SENTENCE */
    /*START TRANSACTION;*/
    /*PREPARE stmt FROM squery;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    */

    /*COMMIT;*/
END$$
DELIMITER ;


Results:

  1. Concatenating 20000 strings takes about 45 seconds to be processed:

CALL test.inputRowsNoRandom(20000);

  1. Concatenating 100000 strings takes about +5/12 minutes O_O:

CALL test.inputRowsNoRandom(100000);

Result (ordered by duration) - stateduration (summed) in sec || percentage
freeing items 0.00005 50.00000
starting 0.00002 20.00000
executing 0.00001 10.00000
init 0.00001 10.00000
cleaning up 0.00001 10.00000
Total 0.00010 100.00000

Change Of STATUS VARIABLES Due To Execution Of Query
variable value description
Bytes_received 21 Bytes sent from the client to the server
Bytes_sent 97 Bytes sent from the server to the client
Com_select 1 Number of SELECT statements that have been executed
Questions 1 Number of statements executed by the server

Tests:
I have already tested with different MySQL configurations from 12 to 64 threads, setting cache on and off, moving logs to another hardware disk...
Also tested using TEXT, INT..

Additional Information:


Questions:

  • Is something wrong in the code? If I send 100000 strings to build the final SQL string, the result of SELECT squery; is a NULL string. Whats happening? (error must be there but I dont see it).
  • Can I improve the code in any way to speed it up?
  • I have read some operations in Stored Procedures can be really slow, should I generate the file in C/Java/PHP.. and send it to mysql?

    mysql -u mysqluser -p databasename < numbers.sql

  • MySQL seems to use only one core for one single SQL query, would nginx or other database system: Multithreadted DBs, Cassandra, Redis, MongoDB..) achieve better performance with stored procedures and use more than one CPU for one query? (Since my single query is using only 20% of total CPU with about 150 threads).

UPDATE:

Community
  • 1
  • 1
xtrm
  • 966
  • 9
  • 22
  • A comment! Why in Cthulhu's name are you using a DBMS to loop mullions of times to concatenate a string. I surprised it's as fast as it it, you are checking the gift horse's teeth here. – Tony Hopkinson Jun 16 '13 at 18:57
  • This is possibly the most informative question on here I have ever come across, bravo – hd1 Jun 16 '13 at 18:57
  • Hi Tony Hopkinson, I was trying to help here http://stackoverflow.com/questions/17042760/storing-changes-on-entities-is-mysql-the-proper-solution?noredirect=1#comment24768877_17042760 and was trying to quickly fill a table with a lot of random data to check for performance and make some tests with millions of data, when I found this problem. – xtrm Jun 16 '13 at 19:00

2 Answers2

10

Don't use loops especially on that scale in RDBMS.

Try to quickly fill your table with 1m rows with a query

INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
SELECT 1, 'a1', 100, 1, 500000, '2013-06-14 12:40:45'
  FROM
(
select a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) d
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) f
) t

It took on my box (MacBook Pro 16GB RAM, 2.6Ghz Intel Core i7) ~8 sec to complete

Query OK, 1000000 rows affected (7.63 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

UPDATE1 Now a version of a stored procedure that uses a prepared statement

DELIMITER $$
CREATE PROCEDURE `inputRowsNoRandom`(IN NumRows INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
    END WHILE;

    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Completed in ~3 min:

mysql> CALL inputRowsNoRandom(1000000);
Query OK, 0 rows affected (2 min 51.57 sec)

Feel the difference 8 sec vs 3 min

UPDATE2 To speed things up we can explicitly use transactions and commit insertions in batches. So here it goes an improved version of the SP.

DELIMITER $$
CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    START TRANSACTION;
    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
        IF i % BatchSize = 0 THEN 
            COMMIT;
            START TRANSACTION;
        END IF;
    END WHILE;
    COMMIT;
    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Results with different batch sizes:

mysql> CALL inputRowsNoRandom1(1000000,1000);
Query OK, 0 rows affected (27.25 sec)

mysql> CALL inputRowsNoRandom1(1000000,10000);
Query OK, 0 rows affected (26.76 sec)

mysql> CALL inputRowsNoRandom1(1000000,100000);
Query OK, 0 rows affected (26.43 sec)

You see the difference yourself. Still > 3 times worse than cross join.

peterm
  • 91,357
  • 15
  • 148
  • 157
  • Really nice peterm, thank you so much, gonna take a close look at the code and change the title to MySQL: Fill a table within a Stored Procedure eficiently. This topic also adds related information: http://stackoverflow.com/questions/1415602/performance-in-pdo-php-mysql-transaction-versus-direct-execution – xtrm Jun 17 '13 at 07:32
  • Tested the first method (fastest one with selects) and Inserted 100 million rows. It took 16 min 37 seconds to complete. Im going to test inserting random data with the same method, probably it will take a lot more. – xtrm Jun 18 '13 at 09:10
0

I had a task as same as yours. The answer above explains thing smoothly. I want to talk about my solution. In my task i have a query. First i generate random data, then enqueue to query and then dequeue from there and write to db. In the begin it took 70 hours to write 242 M data row. Then i changed the engine. MyISAM is quite fster then InnoDB. After that it became 30 hours to write. Still too much... So i have to change smt...

Rather than doing "insert ..... VALUES (1,2,3)",

I did "insert .... VALUES (1,2,3), (4,5,6), (7,8,9)"

It worked super fast. What my point is cut them into smaller piece and write like bulk data to the db.

EDIT: I wrote these because i was in so much trouble and then i figured it out, now i want to share it. lol