0

My PHP 7.4 script select a large InnoDB table from a local MySQL 5.7 database and iterates every line using PDO. I'm using unbuffered queries by purpose as the whole table won't fit server memory.

The query is a simple SELECT * FROM table WHERE field_a = ... ORDER BY field_b, field_c.

The code iterating the result is:

$rows = $db->prepare('redacted');
$rows->bindParam(':redacted', $redacted);
$rows->execute();

foreach($rows as $r) {
    if($r->fieldname != redacted)
        doWork1();
    
    doWork2();
}

doWork2() and it always does the following job, using a separate, buffered PDO connection:

  1. updates the row on the db;
  2. in a transaction, insert few new rows in another table (one INSERT query per row).

Every iteraction doesn't take more than 1-2 seconds, but the whole loop could take hours to complete.

The script works fine, but exactly every 12:00 minutes, PDO throws the following exception:

[03-Feb-2023 08:34:29 Europe/Rome] PHP Fatal error:  Uncaught PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in redacted:176
Stack trace:
redacted
[03-Feb-2023 08:46:50 Europe/Rome] PHP Fatal error:  Uncaught PDOException: SQLSTATE[HY000]: General error: 2006 MySQL server has gone away in redacted:176
Stack trace:
redacted

Line number is always the foreach or the if.

mysqld.log says:

2023-02-03T08:36:33.480661+01:00 781575 [Note] Aborted connection 781575 to db: 'redacted' user: 'redacted' host: 'localhost' (Got timeout writing communication packets)
2023-02-03T08:48:56.534865+01:00 781768 [Note] Aborted connection 781768 to db: 'redacted' user: 'redacted' host: 'localhost' (Got timeout writing communication packets)

Upon restarting the script, it works fine on the same data, so it isn't a problem with values or datatypes. mysqladmin version shows the server never crashed.

I suppose it is due to a sort of timeout.

I noticed the timestamp in mysql log is about 2 minutes later the PHP fatal error, so the disconnection due to writing timeout is the consequence, not the cause. For this reason, increasing net_read_timeout and net_write_timeout (120 seconds), wait_timeout (8 hours), max_allowed_packet (256 MB) didn't help. I reviewed carefully all server variables and I didn't have any value near 12 minutes or 720 seconds.

The query is fast (less than 3 seconds) and optimized; no big fields; it uses indexes; the query EXPLAIN is (relevant part): 594802 | 5.67 | Using index condition; Using where; Using filesort. I'm not in cases mentioned at https://dev.mysql.com/doc/refman/8.0/en/gone-away.html

The machine is a AWS Lightsail with Linux and cPanel.

#  mysqladmin version
mysqladmin  Ver 8.42 Distrib 5.7.40, for Linux on x86_64
redacted
Uptime:                 10 days 10 hours 34 min 13 sec
redacted
Threads: 6  Questions: 15310531  Slow queries: 0  Opens: 9989  Flush tables: 1  Open tables: 1854  Queries per second avg: 16.972

I enabled query log and inspected carefull around time of the error and all queries are correct.

Marco Marsala
  • 2,332
  • 5
  • 25
  • 39
  • Have you tried issuing `SHOW SESSION VARIABLES WHERE Value = 720` from your PDO connection? It's not likely but may highlight a different timeout value being set for the session. – user1191247 Feb 03 '23 at 09:14
  • Yes, and I checked carefully all values near 12 or 720, found nothing relevant. – Marco Marsala Feb 03 '23 at 10:03
  • Additional DB information request, please. RAM size, # cores, any SSD or NVME devices on MySQL Host server? Post TEXT data on justpaste.it and share the links. From your SSH login root, Text results of: A) SELECT COUNT(*), sum(data_length), sum(index_length), sum(data_free) FROM information_schema.tables; B) SHOW GLOBAL STATUS; after minimum 24 hours UPTIME C) SHOW GLOBAL VARIABLES; D) SHOW FULL PROCESSLIST; E) STATUS; not SHOW STATUS, just STATUS; G) SHOW ENGINE INNODB STATUS; for server workload tuning analysis to provide suggestions. – Wilson Hauck Feb 03 '23 at 21:18
  • Post TEXT data on justpaste.it and share the links. Additional very helpful OS information includes - please, htop 1st page, if available, TERMINATE, top -b -n 1 for most active apps, top -b -n 1 -H for details on your mysql threads memory and cpu usage, ulimit -a for list of limits, iostat -xm 5 3 for IOPS by device & core/cpu count, df -h for Used - Free space by device, df -i for inode info by device, free -h for Used - Free Mem: and Swap:, cat /proc/meminfo includes VMallocUused, for server workload tuning analysis to provide suggestions. – Wilson Hauck Feb 03 '23 at 21:18
  • I doubt if the `SELECT` is the slow part. But, just in case, let's see the entire `SELECT` and the `CREATE TABLE`. If may be that you need an composite index. – Rick James Feb 06 '23 at 19:24
  • What is the value of `autocommit` and do you use "transactions"? Also, how many times is that `SELECT` called? – Rick James Feb 06 '23 at 19:26
  • I run the SELECT query just one time, it completes almost immediately, then I iterate the result set once. I don’t use transactions on the unbuffered query. Autocommit is enabled. I have a composite index for every field used in WHERE clause; checked performances with EXPLAIN. – Marco Marsala Feb 06 '23 at 22:11
  • It was net_write_timeout, still didn’t understand why increasing it to the duration of every iteration won’t suffice, but increasing to 3000 fixed the issue and the script is iterating that result set fine since days. – Marco Marsala Feb 06 '23 at 22:13

1 Answers1

1

I was surprised when I was able to replicate this so easily. I created three separate PDO connections to my db server -

  1. for the unbuffered query
  2. run regular calls to check on the status of the thread handling the unbuffered query
  3. run updates based on rows being fetched from the unbuffered query

Without making any config changes, the thread handling the unbuffered query was being closed by the server after 16 minutes. I was unable to find anything in the session vars that would suggest a timeout at 16 minutes.

The solution in my case was to increase the net_write_timeout to be long enough for the entire unbuffered result to be fetched, row-by-row. I increased it to 30mins, 60mins and then 2 hours. Each time the connection for the unbuffered query result remained open until the specified timeout elapsed (1 - 6 minutes after).

I then set it to 36000 (10 hrs) and it has now been running for just over four hours -

SET SESSION net_write_timeout = 36000

I suspect using smaller buffered queries (num of rows dependent on your average row size) to select the data from the source table would be a better solution than holding this connection open for the unbuffered query for hours on end.

The only recent article I came across that discusses unbuffered queries and the effect of net_write_timeout is this one by Hayden James. It only touched on net_write_timeout in passing.

user1191247
  • 10,808
  • 2
  • 22
  • 32
  • Increasing net_write_timeout to 3000 fixed the issue. Initially I increased net_write_timeout=120, the double of the maximum time needed for every iteration to consume a row from the query, but it wasn’t enough. Setting to 3000 was ok, but my queries lasts much more than 50 minutes. I read the article but I still don’t understand how this param works. – Marco Marsala Feb 06 '23 at 22:27