3

I have read the standard (and the javadoc) but still have some questions. My use case is simple: A batchlet fetches data from an external source and acknowledges the data (meaning that the data is deleted from the external source after acknowledgement). Before acknowledging the data the batchlet produces relevant output (in-menory-object) that is to be passed to the next chunk oriented step.

Questions:

1) What is the best practice for passing data between a batchlet and a chunk step? It seems that I can do that by calling jobContext#setTransientUserData in the batchlet and then in my chunk step I can access that data by calling jobContext#getTransientUserData.

I understand that both jobContext and stepContext are implemented in threadlocal-manner. What worries me here is the "Transient"-part. What will happen if the batchlet succeeds but my chunk-step fails? Will the "TransientUserData"-data still be available or will it be gone if the job/step is restarted? For my use case it is important that the batchlet is run just once. So even if the job or the chunk step is restarted it is important that the output data from the successfully-run-batchlet is preserved - otherwise the batchlet have to be once more. (I have already acknowledged the data and it is gone - so running the batchlet once more would not help me.)

2)Follow up question In stepContext there is a couple of methods: getPersistentUserData and setPersistentUserData. What is these method's intended usage? What does the "Persistent"-part refer to? Are these methods relevant only for partitioning?

Thank you! / Daniel

xdaiv
  • 43
  • 6

1 Answers1

2

Transient user data is just transient, and will not be available during job restart. A job restart can happen in a different process or machine, so users cannot count on job transient from previous run being available at restart.

Step persistent user data are those application data that the batch job developers deem necessary to save/persist for purpose of restarting, monitoring or auditing. They will be available at restart, but they are typically scoped to the current step (not across steps).

From reading your brief descriptioin, I got the feeling that your 2 steps are too tightly coupled and you can almost consider them one single unit of work. You want them either both succeed or both fail in order to maintain your application state integrity. I think that could be the root of the problem.

cheng
  • 1,076
  • 6
  • 6
  • +1 Thank you for answering my questions. Yet another folow-upp question (more to the standard-makers, perhaps): What is the reason for not providing get/setPersistentUserData in the JobContext? – xdaiv Mar 11 '18 at 08:32
  • I don't recall a strong reason against it, and the idea was raised. I think it just wasn't a priority that made it into the final release. One other technique: you can, within step 2, get the persistent user data from step 1 by first getting the **StepExecution** from `JobOperator.getStepExecutions(); ... ` and then `StepExecution.getPersistentUserData()` (for the top-level thread at least). – Scott Kurz Mar 11 '18 at 13:13
  • Actually, I had seen a solution like that (JobOperator.getStepExecutions(); ...) but it did not look like a best practise to me. Could you, please, elaborate on your comment "(for the top-level thread at least)"? Will this solution be thread-safe in all situations (not only top-level thread)? – xdaiv Mar 11 '18 at 14:41
  • I meant that the user data obtained from the step context is thread-local, so each partition gets its own, as does the top-level thread, but only the "top-level" StepExecution and user data are accessible via the `JobOperator` API. (This was also mentioned as a spec enhancement idea not yet prioritized). – Scott Kurz Mar 12 '18 at 18:02