3

I am studying Javaee Batch API (jsr-352) in order to test the feasibility of changing out current ETL tool for our own solution using this technology.

My goal is to build a job in which I:

  • get some (dummy) data from a datasource in step1,
  • some other data from other data-source in step2 and
  • merge them in step3.

I would like to process each item and not write to a file, but send it to the next step. And also store the information for further use. I could do that using batchlets and jobContext.setTransientUserData().

I think I am not getting the concepts right: as far as I understood, JSR-352 is meant for this kind of ETL tasks, but it has 2 types of steps: chunk and batchlets. Chunks are "3-phase-steps", in which one reads, processes and writes the data. Batchlets are tasks that are not performed on each item on the data, but once (as calculating totals, sending email and others).

My problem is that my solution is not correct if I consider the definition of batchlets.

How could one implement this kinf od job using Javaee Batch API?

JSBach
  • 4,679
  • 8
  • 51
  • 98

1 Answers1

5

I think you better to use chunk rather than batchlet to implement ETLs. typical chunk processing with a datasource is something like following:

  • ItemReader#open(): open a cursor (create Connection, Statement and ResultSet) and save them as instance variables of ItemReader.
  • ItemReader#readItem(): create and return a object that contains data of a row using ResultSet
  • ItemReader#close(): close JDBC resources
  • ItemProcessor#processItem(): do calculation and create and return a object which contains result
  • ItemWriter#writeItems(): save calculated data to database. open Connection, Statement and invoke executeUpdate() and close them.

As to your situation, I think you have to choose one data which considerble as primary one, and open a cursor for it in ItemReader#open(). then get another one in ItemProcessor#processItem() for each item.

Also I recommend you to read useful examples of chunk processing:

My blog entries about JBatch and chunk processing:

Kohei Nozaki
  • 1,154
  • 1
  • 13
  • 36
  • Thanks for the answer, in your opinion I should have a 1-step-job, in which I perform all the ETL logic? – JSBach May 26 '15 at 09:01
  • I understand, but isn't the idea of the batch API that I can break the process in simple steps? (This is an honest question, I feel I am lacking some basic understanding about this JSR) – JSBach May 26 '15 at 09:41
  • If your entire data are small as sufficient to be loaded on the memory, and you have no interest in advantages of chunk processing such as transaction management, skipping invalid data, retrying, saving metrics or restarting from remaining data, I think you are fine to implement the job contains 3 simple (Batchlet) steps using `getTransientData()` and `setTransientData()` as you mentioned. I hope this answers your question. – Kohei Nozaki May 26 '15 at 09:59