3

I have a Quartz Job like this

@PersistJobDataAfterExecution
@DisallowConcurrentExecution
public class MyJob{

     public void execute(JobExecutionContext jec) throws JobExecutionException {
         //connect to a FTP server, monitor directory for new files and download
         //Using FTPClient of commons-net-3.5.jar 
     }

The job is triggered with

JobDetail jobDetail = newJob(MyJob.class)
    .withIdentity(jobName, DEFAULT_GROUP)
    .usingJobData(new JobDataMap(jobProperties))
    .build();   

//trigger every minute                  
Trigger trigger = newTrigger()
    .withIdentity(jobName, DEFAULT_GROUP)
    .startNow()
    .withSchedule(cronSchedule(cronExpression))
    .build();

scheduler.scheduleJob(jobDetail,trigger);

The job is triggered every minute. It works well for about 1 week (10000 Executions) and inexplicably not relaunches. There are no errors in the log and see that it has completed the previous execution. The other processes are firing correctly.

Upgrading libraries to quartz-2.2.3 and commons-net-3.5 (looking for a possible bug in the FTP library) I managed to last 3 weeks

I have a Job to monitor Scheduler that says trigger state is BLOCKED. The Thread of the blocked process is not reused by application server

 TriggerState triggerState = scheduler.getTriggerState(triggerKey);

I have not found documentation on this type of problem with Quartz, so my suspicion is a bug in the FTP library that interferes with the thread started by quartz for example with the usage of @PersistJobDataAfterExecution

I wonder if it's a known issue or could be a bug so I could apply a solution or a workaround ( killing the quartz job how to stop/interrupt quartz scheduler job manually)

Community
  • 1
  • 1
pedrofb
  • 37,271
  • 5
  • 94
  • 142
  • It fails always in the same execution or time? Which pattern has the job assigned? – Jordi Castilla Aug 16 '16 at 09:06
  • The failure time varies between 3 days and 3 weeks (4300 - 30000 executions), but occasionally failed in 24h. Usually fails after an empty execution: FTP connect, no files, disconnect. The cron expression is : `0 0/1 * * * ?` – pedrofb Aug 16 '16 at 09:15
  • maybe this empty execution throws any unhandled exception or error that may affect cron? pattern and job seem to be created in the right way... – Jordi Castilla Aug 17 '16 at 07:02
  • Thanks for the answer @JordiCastilla. In the last change, I added a log line as the final step of the execution method surrounded with a big `try/catch/finally` to ensure there is no a uncatched exception. Also I have reviewed application server logs looking for lost exceptions, but no luck. I think if it was an exception, the process will fail, but does not maintain the thread blocked (the Java `thread` is not used anymore) . Maybe last action of quartz ( `PersistJobDataAfterExecution`) is being blocked for any reason. – pedrofb Aug 17 '16 at 07:44

1 Answers1

0

After months with occasional drops of service and suspect that FTP connectivity errors block the service, we have finally implemented a measure that seems to solve the problem

Each process executions do now:

FTPClient ftp = new FTPClient();

//Added connection timeout before connect()
ftp.setDefaultTimeout(getTimeoutInMilliseconds());  

ftp.connect(host, port);

//Added more timeouts to see if thread locks disappear...
ftp.setBufferSize(1024 * 1024);
ftp.setSoTimeout(getTimeoutInMilliseconds());   

The weird thing is that the process was not blocked previously in connect(), the process continued and ended without restarting, but when setting the timeout the problem has not happened again

pedrofb
  • 37,271
  • 5
  • 94
  • 142