1

I'm running into an issue with a long running transaction that is blocking all writes to the database as a result. I'm wondering, what is the best way to diagnose this issue and is it possible to force the transactions to commit and save the data that is currently hung up?

Background info: The app is using a custom DAO/ORM written in python using pymysql. I don't have much confidence in this code, but it has worked for awhile and this issue starting popping up 2 days ago.

My attempt: I've never seen anything like this before so everything I tried is based mostly off of googlefu. I started by looking at the transactions given from SHOW ENGINE INNODB STATUS; (I tried to omit non relevant info).

------------
TRANSACTIONS
------------
Trx id counter 279345410
Purge done for trx's n:o < 277758364 undo n:o < 0 state: running but idle
History list length 271773
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 277760341, ACTIVE 153800 sec
28 lock struct(s), heap size 376, 49 row lock(s), undo log entries 90
MySQL thread id 2310, OS thread handle 0x2b3044788700, query id 3783662 172.30.1.223 backers cleaned up
---TRANSACTION 277758227, ACTIVE 153933 sec
82 lock struct(s), heap size 376, 1452 row lock(s), undo log entries 4233
MySQL thread id 1972, OS thread handle 0x2b3061a43700, query id 3783650 54.0.0.1 syncmachine cleaned up

Those 2 transactions are from 2 different apps on 2 different databaseson the same server. They have obviously been running longer than expected.

Next I though there may be a deadlock or something similar so I tried select trx_id,trx_state from information_schema.innodb_trx;

mysql> select trx_id,trx_state from information_schema.innodb_trx;
+-----------+-----------+
| trx_id    | trx_state |
+-----------+-----------+
| 279387152 | RUNNING   |
| 279387149 | RUNNING   |
| 279387114 | RUNNING   |
| 279384295 | RUNNING   |
| 279381054 | RUNNING   |
| 279347599 | RUNNING   |
| 278841669 | RUNNING   |
| 277760341 | RUNNING   |*
| 277758227 | RUNNING   |*
| 277758147 | RUNNING   |
+-----------+-----------+
10 rows in set (0.02 sec)

so no LOCK WAITs which I thought could have been the issue.

Lastly I tried SELECT * FROM information_schema.innodb_trx\G

mysql> SELECT * FROM information_schema.innodb_trx\G
*************************** 6. row ***************************
                    trx_id: 277760341
                 trx_state: RUNNING
               trx_started: 2017-08-22 19:13:06
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 119
       trx_mysql_thread_id: 2310
                 trx_query: NULL
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 28
     trx_lock_memory_bytes: 376
           trx_rows_locked: 49
         trx_rows_modified: 91
   trx_concurrency_tickets: 0
       trx_isolation_level: READ COMMITTED
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
*************************** 7. row ***************************
                    trx_id: 277758227
                 trx_state: RUNNING
               trx_started: 2017-08-22 19:10:53
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 4315
       trx_mysql_thread_id: 1972
                 trx_query: NULL
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 0
          trx_lock_structs: 82
     trx_lock_memory_bytes: 376
           trx_rows_locked: 1452
         trx_rows_modified: 4233
   trx_concurrency_tickets: 0
       trx_isolation_level: READ COMMITTED
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
 trx_adaptive_hash_latched: 0
 trx_adaptive_hash_timeout: 0
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
8 rows in set (0.01 sec)

but trx_query is NULL for both. So how is that the case with so many locks involved?

Does anyone have any suggestions on diagnosing this? Also, is there some way to force the transactions through? Will FLUSH TABLES accomplish this?

Shatnerz
  • 2,353
  • 3
  • 27
  • 43

1 Answers1

1

You should be able to review the transaction history with the following: https://www.psce.com/en/blog/2015/01/22/tracking-mysql-query-history-in-long-running-transactions/ (You should save the resulting log as it will be a way to recover data lost in the transaction)

I don't know of anyway to recover a session other than break-pointing into the code that has the context with the current session and attempting a debug from there.

Your best bet is to unlock the tables and perform a backup: https://dev.mysql.com/doc/mysql-backup-excerpt/5.7/en/backup-methods.html

Review any new modifications to the code, perhaps someone forgot a commit() somewhere and there's some reason the session isn't being disconnected. A background worker task could be the culprit as it would be able to hold on to the same context and put your tables in lock. This can be fixed by setting timeouts: MySQL rollback on transaction with lost/disconnected connection

Mark Clark
  • 471
  • 4
  • 15
  • I did go through that psce link previously. Unfortunately performance_schema was not enabled for some reason – Shatnerz Aug 24 '17 at 18:23