How to properly configure clustered Quartz with MariaDB

Question

We have a clustered quartz configuration with 8 geographically distributed nodes all backed by a single instance of MariaDB. We’ve repeatedly observed the following error :

2018-04-26 00:14:01,186 [censored_QuartzSchedulerThread] ERROR org.quartz.core.ErrorLogger - An error occurred while firing triggers '[Trigger 'DEFAULT.a563a68e-b30d-4dab-b24f-540c5fa0cef8':  triggerClass: 'org.quartz.impl.triggers.CronTriggerImpl calendar: 'null' misfireInstruction: 0 nextFireTime: Thu Apr 26 00:11:00 EDT 2018]'
org.quartz.impl.jdbcjobstore.LockException: Failure obtaining db row lock: (conn:83) Lock wait timeout exceeded; try restarting transaction
Query is: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = 'censored' AND LOCK_NAME = ? FOR UPDATE, parameters ['TRIGGER_ACCESS']
        at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:157)
        at org.quartz.impl.jdbcjobstore.DBSemaphore.obtainLock(DBSemaphore.java:113)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3792)
        at org.quartz.impl.jdbcjobstore.JobStoreSupport.triggersFired(JobStoreSupport.java:2912)
        at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:336)
Caused by: java.sql.SQLException: (conn:83) Lock wait timeout exceeded; try restarting transaction
Query is: SELECT * FROM QRTZ_LOCKS WHERE SCHED_NAME = 'censored' AND LOCK_NAME = ? FOR UPDATE, parameters ['TRIGGER_ACCESS']
        at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:150)
        at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101)
        at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77)
        at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226)
        at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:413)
        at org.mariadb.jdbc.MariaDbServerPreparedStatement.execute(MariaDbServerPreparedStatement.java:362)
        at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeQuery(MariaDbServerPreparedStatement.java:343)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
        at org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:96)
        at org.quartz.impl.jdbcjobstore.StdRowLockSemaphore.executeSQL(StdRowLockSemaphore.java:96)
        ... 4 common frames omitted

Other nodes also have similar errors:

2018-04-26 00:01:05,106 [censored_QuartzSchedulerThread] ERROR org.quartz.core.ErrorLogger - An error occurred while scanning for the next triggers to fire.
org.quartz.JobPersistenceException: Couldn't acquire next trigger: (conn:75) Lock wait timeout exceeded; try restarting transaction
Query is: UPDATE QRTZ_TRIGGERS SET TRIGGER_STATE = ? WHERE SCHED_NAME = 'censored' AND TRIGGER_NAME = ? AND TRIGGER_GROUP = ? AND TRIGGER_STATE = ?, parameters ['ACQUIRED','12d649da-56fb-47bf-874e-96dca451291f','DEFAULT','WAITING']
      at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTrigger(JobStoreSupport.java:2860)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport$40.execute(JobStoreSupport.java:2759)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport$40.execute(JobStoreSupport.java:2757)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3799)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport.acquireNextTriggers(JobStoreSupport.java:2756)
      at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:272)
Caused by: java.sql.SQLException: (conn:75) Lock wait timeout exceeded; try restarting transaction

And

2018-05-06 06:27:10,438 [QuartzScheduler_censored-schedulerName_ClusterManager] ERROR o.q.impl.jdbcjobstore.JobStoreTX - ClusterManager: Error managing cluster: Failure updating scheduler state when checking-in: (conn:52) Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
org.quartz.JobPersistenceException: Failure updating scheduler state when checking-in: (conn:52) Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
      at org.quartz.impl.jdbcjobstore.JobStoreSupport.clusterCheckIn(JobStoreSupport.java:3418)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport.doCheckin(JobStoreSupport.java:3265)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport$ClusterManager.manage(JobStoreSupport.java:3870)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport$ClusterManager.run(JobStoreSupport.java:3907)
Caused by: java.sql.SQLTransactionRollbackException: (conn:52) Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
      at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:141)
      at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101)
      at org.mariadb.jdbc.internal.util.ExceptionMapper.throwAndLogException(ExceptionMapper.java:77)
      at org.mariadb.jdbc.MariaDbStatement.executeQueryEpilog(MariaDbStatement.java:226)
      at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:413)
      at org.mariadb.jdbc.MariaDbServerPreparedStatement.execute(MariaDbServerPreparedStatement.java:362)
      at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeUpdate(MariaDbServerPreparedStatement.java:351)
      at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
      at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
      at org.apache.commons.dbcp.DelegatingPreparedStatement.executeUpdate(DelegatingPreparedStatement.java:105)
      at org.quartz.impl.jdbcjobstore.StdJDBCDelegate.insertSchedulerState(StdJDBCDelegate.java:2948)
      at org.quartz.impl.jdbcjobstore.JobStoreSupport.clusterCheckIn(JobStoreSupport.java:3413)
      ... 3 common frames omitted
Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO QRTZ_SCHEDULER_STATE (SCHED_NAME, INSTANCE_NAME, LAST_CHECKIN_TIME, CHECKIN_INTERVAL) VALUES('censored', ?, ?, ?), parameters ['schedulerName',1525602430435,15000]
      at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readErrorPacket(AbstractQueryProtocol.java:1144)
      at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.readPacket(AbstractQueryProtocol.java:1076)
      at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.getResult(AbstractQueryProtocol.java:1031)
      at org.mariadb.jdbc.internal.protocol.AbstractQueryProtocol.executePreparedQuery(AbstractQueryProtocol.java:617)
      at org.mariadb.jdbc.MariaDbServerPreparedStatement.executeInternal(MariaDbServerPreparedStatement.java:401)
      ... 10 common frames omitted

Our Quartz configuration is:

org.quartz.scheduler.instanceName=censored
org.quartz.scheduler.instanceId=AUTO

org.quartz.threadPool.class=org.quartz.simpl.SimpleThreadPool
org.quartz.threadPool.threadCount=10

org.quartz.jobListener.NAME.class=our.JobListenerImplementation
org.quartz.jobListener.NAME.jobListenerName=JobListener

org.quartz.triggerListener.NAME.class=our.TriggerListenerImplementation
org.quartz.triggerListener.NAME.triggerName=SchedulerTriggerListener


org.quartz.jobStore.class=org.quartz.impl.jdbcjobstore.JobStoreTX
org.quartz.jobStore.driverDelegateClass=org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.tablePrefix=QRTZ_
org.quartz.jobStore.dataSource=quartzDataSource
org.quartz.jobStore.isClustered = true
org.quartz.jobStore.clusterCheckinInterval=15000
org.quartz.jobStore.maxMisfiresToHandleAtATime=20
org.quartz.jobStore.acquireTriggersWithinLock=true


org.quartz.dataSource.quartzDataSource.connectionProvider.class=our.ConnectionProviderImplementation

org.quartz.dataSource.quartzDataSource.driver=org.mariadb.jdbc.Driver
org.quartz.dataSource.quartzDataSource.url=
org.quartz.dataSource.quartzDataSource.user=
org.quartz.dataSource.quartzDataSource.password=

org.quartz.dataSource.quartzDataSource.initialPoolSize=3
org.quartz.dataSource.quartzDataSource.maxActive=50
org.quartz.dataSource.quartzDataSource.maxIdle=20
org.quartz.dataSource.quartzDataSource.minIdle=2
org.quartz.dataSource.quartzDataSource.maxWait=180000
org.quartz.dataSource.quartzDataSource.setMaxOpenPreparedStatements=10
org.quartz.dataSource.quartzDataSource.removeAbandoned=true
org.quartz.dataSource.quartzDataSource.removeAbandonedTimeout=300
org.quartz.dataSource.quartzDataSource.logAbandoned=true
org.quartz.dataSource.quartzDataSource.testWhileIdle=true
org.quartz.dataSource.quartzDataSource.testOnBorrow=true
org.quartz.dataSource.quartzDataSource.testOnReturn=true
org.quartz.dataSource.quartzDataSource.validationQuery=select 1 from dual
org.quartz.dataSource.quartzDataSource.validationQueryTimeout=1

As a result of these issues obtaining a lock, the call to create a new schedule often (but not always) does not complete, and triggers are not being triggered as expected. All this is happening under light load, with no more than a couple of requests to create a schedule per minute.

We are also seeing the following Warning, which may or may not be related:

2018-05-06 06:27:10,434 [QuartzScheduler_censored-schedulerName_ClusterManager] WARN  o.q.impl.jdbcjobstore.JobStoreTX - This scheduler instance (schedulerName) is still active but was recovered by another instance in the cluster.  This may cause inconsistent behavior.

Anyone experienced this problem while running clustered Quartz with MariaDB? Suggestions would be greatly appreciated.

Did you ever manage to get this solved ? – Sergiu Sep 25 '18 at 07:01 — Sergiu, Sep 25 '18 at 07:01
nope, not yet. Eventually I'll get back to this – Sergio Sep 25 '18 at 17:36 — Sergio, Sep 25 '18 at 17:36

How to properly configure clustered Quartz with MariaDB

0 Answers0