1

Today, we tried adding an index to a table on a production DB (a RDS Aurora cluster running on MySQL 5.6), and ended up deadlocking the DB. We killed the process when it was pending, trying to acquire the MDL.

I'd like some help understanding why this happened. My understanding is that MySQL 5.6 supports online DDL, in which reads and writes can happen during the operation. I also understand that even with online DLL, the operation still needs to acquire the MDL. However, if the process was still waiting to acquire the MDL, why did the DB deadlock?

Is it possible that because we didn't explicitly set LOCK=NONE, the operation still ended up acquiring some locks? Would setting LOCK=NONE prevent a deadlock for happening in the future? If not, is there a way that we can perform the index creation in safe manner, preferably with no downtime?

(I've taken a look at this post, which seems to imply that such operations are safe in 5.6+ due to online DDL).

thevises
  • 613
  • 2
  • 5
  • 19
  • Did you check if some other process was holding locks on data? – Shubham Srivastava Sep 10 '20 at 04:21
  • Other processes were holding the MDL lock, but I'm still not understanding why that would cause a deadlock? If the DDL didn't run, then the write queries that were waiting on the DDL would instead be waiting on the process holding the lock? – thevises Sep 10 '20 at 04:36
  • Aurora is not MySQL. They say they rewrote large portions of the storage engine to implement the distributed storage of Aurora, but it is not open-source and Amazon does not share implementation details. I have no idea how locking works in Aurora. – Bill Karwin Sep 10 '20 at 13:47

0 Answers0