2

I am building a Spring Batch application.
Suppose that I have a Job which executes, for example:

  • Split an audio file
  • Perform Speech-To-Text
  • Suppose that I have a TaskExecutor, allowing the Chunk-oriented step(s) to be parallelyzed.

    Are there any benefits in using 2-Steps instead of putting all these operations in a single one?

    My doubt is that using 2-Steps causes the "already finished files" to wait for all the pool to complete, causing inefficiency.

    Thanks in advance

    1 Answers1

    1

    I would recommend doing this in two steps. The main reason is error handling. I'd assume that once you split the file, you won't want to have to do that again if there is an error on the speech-to-text processing. If that is the case, by separating the processing into two steps, the split functionality won't need to be rerun. Also, it means that the chunk oriented processing can be more stateful in that the chunks that have been processed successfully won't need to be re-executed. Yes, you could code this functionality yourself to behave this way, but Spring Batch provides the functionality out of the box...why not take advantage of it?

    Michael Minella
    • 20,843
    • 4
    • 55
    • 67
    • But I assume this may not address the concern about "waiting" for all audio splits (step 1) to be completed hence causing inefficiency? Should this then handled by using smaller chunks? – user2488286 Jun 01 '22 at 14:22
    • @user2488286 While the waiting is an issue, split operations are typically do not have a large performance impact. The error angle I point out in my answer, however, is a bigger risk. Also, if the split is done in advance, you can partition the output and run the chunks in parallel significantly improving performance overall. – Michael Minella Jun 01 '22 at 15:32