Spring Batch/Task Repository Data Mismatch when introducing use of Tasks after historic Spring Batch data already exists

Question

My project has 20+ batch jobs that are built with Spring Batch and have been in Production for a couple of years. We are currently in the process of migrating them to individual Spring Boot applications that are built with Spring Batch and Spring Cloud Task. These will then be deployed as Tasks in Spring Cloud Dataflow and deployed to PCF.

Given that these jobs (which were only using Spring Batch) were already in Production, the Batch Repository tables contain tons of data of all their past executions. When we deploy the newly migrated jobs (which introduce the Task tables), the data between the Batch and Task tables will not match, since the Task tables will be newly created and thus empty. Although this doesn't prevent us from running new job executions, it does prevent us from using the "Jobs" tab in Spring Cloud Dataflow. This is because in order to load the page, it queries the TASK_TASK_BATCH table trying to match every job_execution_id with a task_execution_id. This throws the infamous NullPointerException mentioned in other posts (Dataflow Tasks are not working with Spring Batch), when there is no such record for every job_execution_id.

So my question is, what is the proper way to address this discrepancy, for any team who has already been using Spring Batch, and are migrating the same jobs to also use Spring Cloud Task? Is there any process provided by Spring to address this? Ideally we want to keep this data of past batch job executions in the Batch Repository tables, we don't want to delete it. Would we then have to make up 'matching' dummy data in the Task tables to get rid of this discrepancy?

Thank you.

I believe there is a fix for the NPE issue you're referring to in Spring Cloud Data Flow 1.4. Have you tried it out? — Michael Minella, Apr 05 '18 at 19:41
Thank you for replying Michael. I tested this out today. I checked out DF 1.4, built, and deployed to PCF. Although I don't see the NPE anymore, I now get a different error message: "No corresponding taskExecutionId for jobExecutionId #####. This indicates that Spring Batch application has been executed that is not a Spring Cloud Task.". This execution corresponds to one of our 'old' batch jobs which do not use Cloud Task. I was expecting for this screen to show at least all the job executions that _are_ tasks, and perhaps ignore the ones that are only batch. Or maybe even display all of them. — lframirez89, Apr 06 '18 at 21:13
What is the expected behavior of this screen, when there are mixed job executions of jobs that are only Spring Batch, and some that are both Spring Batch and Spring Cloud Task? — lframirez89, Apr 06 '18 at 21:16
I forgot to mention, aside from the error message, the page showed no job executions at all. Not even the ones for jobs that were both batch and cloud tasks. — lframirez89, Apr 06 '18 at 21:33
You're seeing the expected behavior. Spring Cloud Data Flow isn't intended to support data generated from batch jobs that were not tasks. You're going to either need to fake something out or start with a new repository for the new (SCDF orchestrated) jobs. — Michael Minella, Apr 06 '18 at 21:48
In this case may I ask, what exactly does the 1.4 fix do? It seems the only thing it does is replace the NPE with a more informative message. I was expecting, at the very least, for the non-task jobs to be filtered out and the page only showing the ones that are Cloud Tasks. In my opinion, the data of the jobs that are Tasks should be displayed since they are still valid executions, even if there are some that aren't. I suspect many other teams in the same situation who have been using batch, and are adopting Cloud Task and SCDF, will run into this same issue. — lframirez89, Apr 06 '18 at 22:04
1.4 fixes the NPE thrown by the server side. I was hoping but later found out the front end doesn’t handle the new exception yet. Stay tuned. — Michael Minella, Apr 07 '18 at 02:33
I see. Looking forward to this being handled in the future, so that we are able to use this screen without a big workaround. Thank you for your help Michael! — lframirez89, Apr 09 '18 at 15:37

Spring Batch/Task Repository Data Mismatch when introducing use of Tasks after historic Spring Batch data already exists

0 Answers0