12

I was doing bulk inserts in the RealTime Index using PHP and by Disabling AUTOCOMIT , e.g.

// sphinx connection
$sphinxql = mysqli_connect($sphinxql_host.':'.$sphinxql_port,'',''); 

//do some other time consuming work

//sphinx start transaction
mysqli_begin_transaction($sphinxql);

//do 50k updates or inserts

// Commit transaction
mysqli_commit($sphinxql);

and kept the script running overnight, in the morning i saw

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
212334 bytes) in

so when i checked the nohup.out file closely , i noticed , these lines ,

PHP Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502
Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502

memory usage before these lines was normal , but memory usage after these lines started to increase, and it hit the php mem_limit and gave PHP Fatal error and died.

in script.php , line 502 is

mysqli_query($sphinxql,$update_query_sphinx);

so my guess is, sphinx server closed/died after few hours/ minutes of inactivity.

i have tried setting in sphinx.conf

client_timeout = 3600

Restarted the searchd by

systemctl restart searchd

and still i am facing same issue.

So how can i not make sphinx server die on me ,when no activity is present for longer time ?


more info added -

i am getting data from mysql in 50k chunks at a time and doing while loop to fetch each row and update it in sphinx RT index. like this

//6mil rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.

$subset_count = 50000 ;

$total_count_query = "SELECT COUNT(*) as total_count FROM content WHERE enabled = '1'" ;
$total_count = mysqli_query ($conn,$total_count_query);
$total_count = mysqli_fetch_assoc($total_count);
$total_count = $total_count['total_count'];

$current_count = 0;

while ($current_count <= $total_count){

$get_mysql_data_query = "SELECT record_num, views , comments, votes FROM content WHERE enabled = 1  ORDER BY record_num ASC LIMIT $current_count , $subset_count ";

//sphinx start transaction
mysqli_begin_transaction($sphinxql);

if ($result = mysqli_query($conn, $get_mysql_data_query)) {

    /* fetch associative array */
    while ($row = mysqli_fetch_assoc($result)) {

    //sphinx escape whole array
    $escaped_sphinx = mysqli_real_escape_array($sphinxql,$row);

    //update data in sphinx index
    $update_query_sphinx = "UPDATE $sphinx_index  
    SET 
        views       = ".$escaped_sphinx['views']." , 
        comments    = ".$escaped_sphinx['comments']." , 
        votes   = ".$escaped_sphinx['votes']." 
    WHERE 
        id          = ".$escaped_sphinx['record_num']." ";  

    mysqli_query ($sphinxql,$update_query_sphinx);

    }

    /* free result set */
    mysqli_free_result($result);
}
// Commit transaction
mysqli_commit($sphinxql);

$current_count = $current_count + $subset_count ;
}
  • You can either update/insert in smaller chunks. Say 1k. Or update/insert in a single query(if it's possible in your case). – Andrei Sep 14 '15 at 14:45
  • @Andrew i have around 6mil records to update daily, so 1k records per cycle are time consuming, so i am looking for something solid solution, even something like custom function reconnect which will keep the connection active at each loop. –  Sep 14 '15 at 14:54
  • Can you post the querys? I have a feeling it may be more related to that than anything else. – Andrei Sep 14 '15 at 14:57
  • @Andrew i have another cronjob where, i have time consuming work before mysqli_begin_transaction ,so it throws error at the initial mysqli_begin_transaction, so i think there must be something i am missing. i am updating question with queries. –  Sep 14 '15 at 15:07
  • May wanna look at [this](http://stackoverflow.com/questions/3479537/mysqli-persistent-connection) too. – Andrei Sep 14 '15 at 15:12
  • @Andrew please check added more info. thanks –  Sep 14 '15 at 15:16
  • I assume `record_num` is numeric. If that's the case, I'd drop the `ORDER BY` completely and order them using php and some array functions. You're also doing updates in a loop, that's you're main worry at this point(as far as I'm concerned anyway). You should form a single query using [`CASE`](https://dev.mysql.com/doc/refman/5.0/en/case.html). – Andrei Sep 14 '15 at 15:22
  • yes, record_num is numeric , ORDER BY is there because there are lots of inserts in the Table going on ,so i dont want some rows to gets missed from updating in sphinx RT index, i am not surehow default order by works when there are lots of inserts in the table. i am looking into CASE –  Sep 14 '15 at 16:00
  • Have you tried [mysqli_ping](http://php.net/manual/en/mysqli.ping.php) just above line 502? Note that you need to set `mysqli.reconnect` to 1 – tillz Sep 19 '15 at 14:10
  • Insert records with a limit, when done use .result to start a new insert, until you have got everything. –  Sep 22 '15 at 13:14
  • Could you use prepared queries for this? It would go a lot faster. Also the batched inserts of commits every 1000 would have no significant impact on performance. You prepare the 'update' query outside the loop and use the `bind_param` inside the loop. You do not need to `escape` the the values. The update will be a lot quicker. – Ryan Vincent Sep 23 '15 at 01:29
  • @RyanVincent i did some benchmarking and if i use chunks lower than 100k then it takes way too much time , for 50k chunks it took 52 minutes, for 100k chunks it takes 26 minutes, but if i increase the 100k to all at once then i get out of memory error. –  Sep 23 '15 at 13:16
  • Thanks for testing it. – Ryan Vincent Sep 23 '15 at 13:53

2 Answers2

1

So there are a couple of issues here, both related to running big processes.

  1. MySQL server has gone away - This usually means that MySQL has timed out, but it could also mean that the MySQL process crashed due to running out of memory. In short, it means that MySQL has stopped responding, and didn't tell the client why (i.e. no direct query error). Seeing as you said that you're running 50k updates in a single transaction, it's likely that MySQL just ran out of memory.
  2. Allowed memory size of 134217728 bytes exhausted - means that PHP ran out of memory. This also leads credence to the idea that MySQL ran out of memory.

So what to do about this?

The initial stop-gap solution is to increase memory limits for PHP and MySQL. That's not really solving the root cause, and depending on t he amount of control you have (and knowledge you have) of your deployment stack, it may not be possible.

As a few people mentioned, batching the process may help. It's hard to say the best way to do this without knowing the actual problem that you're working on solving. If you can calculate, say, 10000 or 20000 records instad of 50000 in a batch that may solve your problems. If that's going to take too long in a single process, you could also look into using a message queue (RabbitMQ is a good one that I've used on a number of projects), so that you can run multiple processes at the same time processing smaller batches.

If you're doing something that requires knowledge of all 6 million+ records to perform the calculation, you could potentially split the process up into a number of smaller steps, cache the work done "to date" (as such), and then pick up the next step in the next process. How to do this cleanly is difficult (again, something like RabbitMQ could simplify that by firing an event when each process is finished, so that the next one can start up).

So, in short, there are your best two options:

  1. Throw more resources/memory at the problem everywhere that you can
  2. Break the problem down into smaller, self contained chunks.
samlev
  • 5,852
  • 1
  • 26
  • 38
1

You need to reconnect or restart the DB session just before mysqli_begin_transaction($sphinxql)

something like this.

<?php

//reconnect to spinx if it is disconnected due to timeout or whatever , or force reconnect
function sphinxReconnect($force = false) {
    global $sphinxql_host;
    global $sphinxql_port;
    global $sphinxql;
    if($force){
        mysqli_close($sphinxql);
        $sphinxql = @mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR'); 
    }else{
        if(!mysqli_ping($sphinxql)){
            mysqli_close($sphinxql);
            $sphinxql = @mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR'); 
        }
    }
}



//10mil+ rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.

//reconnect to sphinx
sphinxReconnect(true);

//sphinx start transaction
mysqli_begin_transaction($sphinxql);

//do your otherstuff

// Commit transaction
mysqli_commit($sphinxql);