8

Background

I'm using Spring Batch 2.1.8, and run jobs by CommandLineJobRunner. Such as:

java org.springframework.batch.core.launch.support.CommandLineJobRunner classpath:launchContext.xml theJobId

Problem

At some condition such as a server crash, running job could be interrupted. But the interrupted job left a STARTED status in the Spring Batch Meta-Data tables, and can't be run again.

org.springframework.batch.core.repository.JobExecutionAlreadyRunningException: A job execution for this job is already running

I can think of two solutions:

Solution1

Add a new job parameter and change it everytime to make it a "new" job for Spring Batch. Such as:

java org.springframework.batch.core.launch.support.CommandLineJobRunner classpath:launchContext.xml theJobId times=0

And when need to rerun it, do clear all corresponding output data, count up times once, and then rerun the job.

Solution2

Change the Spring Batch Meta-Data tables manually.

To update the status to make the job restartable. Such as:

UPDATE BATCH_JOB_EXECUTION SET END_TIME = SYSTIMESTAMP, STATUS = 'FAILED', EXIT_CODE = 'FAILOVER' WHERE JOB_EXECUTION_ID =
    (SELECT MAX(JOB_EXECUTION_ID) FROM BATCH_JOB_EXECUTION WHERE JOB_INSTANCE_ID =
        (SELECT MAX(JOB_INSTANCE_ID) FROM BATCH_JOB_INSTANCE WHERE JOB_NAME = 'XXX'));

I've tried it and it seems works well.

Question

Is Solution2 a bad idea? Are there any traps?

Thanks in advance. And any other solutions are appreciated.

songyuanyao
  • 169,198
  • 16
  • 310
  • 405

2 Answers2

7

Solution 2 is the accepted approach right now. The API does not provide a way to fix this scenario. There have been requests in the past for the framework to clean up automatically, but 99% of the time, a human decision is needed to determine if cleanup is truly required.

My only note for option 2 would be to check the BATCH_STEP_EXECUTION table as well to see what state the last executed step was left in.

Michael Minella
  • 20,843
  • 4
  • 55
  • 67
3

I created a specific spring bean for this which is triggered on a container refresh (which happens on app (re)start too).

It searches for 'running' jobs, marks them 'FAILED' and restarts them.

import java.util.Date;
import java.util.List;
import java.util.Set;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import org.springframework.batch.core.BatchStatus;
import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.explore.JobExplorer;
import org.springframework.batch.core.launch.JobOperator;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationListener;
import org.springframework.context.event.ContextRefreshedEvent;
import org.springframework.stereotype.Component;

@Component
public class BatchJobRestarter implements ApplicationListener<ContextRefreshedEvent> {

    private static final Logger LOGGER  = LoggerFactory.getLogger(BatchJobRestarter.class);

    @Autowired
    private JobExplorer         jobExplorer;

    @Autowired
    JobRepository               jobRepository;

    @Autowired
    JobOperator                 jobOperator;

    @Override
    public void onApplicationEvent(ContextRefreshedEvent contextRefreshedEvent) {
        LOGGER.info("Container restart: restarting 'running' batch jobs");
        List<String> jobs = jobExplorer.getJobNames();
        for (String job : jobs) {
            Set<JobExecution> runningJobs = jobExplorer.findRunningJobExecutions(job);

            for (JobExecution runningJob : runningJobs) {
                try {
                    LOGGER.info("Restarting job {} with parameters {}", runningJob.getJobInstance().getJobName(), runningJob.getJobParameters().toString());
                    runningJob.setStatus(BatchStatus.FAILED);
                    runningJob.setEndTime(new Date());
                    jobRepository.update(runningJob);
                    jobOperator.restart(runningJob.getId());
                } catch (Exception e) {
                    LOGGER.error(e.getMessage(), e);
                }
            }
        }
    }
}

Steef

Steef
  • 31
  • 2