1

I have two business logic steps:

  1. download xml from external resource parse and transform it into objects
  2. dispatch the output(object list) to external queue

    @Bean
    public Job job() throws Exception {
        return this.jobs.get("job").start(getXmlViaHttpStep()).next(pushMessageToQueue()).build();
    }
    

So my first Step is Tasklet which downloads (via http) the file and converts it into Objects.

My second task is another Tasklet that suppose to dispatch the output from the previous step.

Now how do I pass the output list from step1 into step2 (as its input)?

I could save that on temp file, but isn't there another best practice scenario for this?

demongolem
  • 9,474
  • 36
  • 90
  • 105
rayman
  • 20,786
  • 45
  • 148
  • 246
  • It is just a single step. 1. is a reader 2 is a writer. A single chunk based step. Do you need a single message or a message per object? – M. Deinum Dec 31 '14 at 12:45
  • I need to retrieve the list of the output objects from step 1 and have them as an input to step2. so I could iterate each item and send it to queue. – rayman Dec 31 '14 at 13:09
  • That is just a single step with a reader and writer. Chunk based. You can use a one of the xml readers for that and there are several messaging (JMS for instance) writers available to write messages to a queue. – M. Deinum Dec 31 '14 at 13:18
  • But each operation I want to monitor as a step. wont it be easier for me to split those into couple of steps from a monitoring/rollback/etc.. aspects?? – rayman Dec 31 '14 at 13:34
  • Is there a reason you aren't using Spring Integration for this process? – Michael Minella Dec 31 '14 at 15:31
  • And to skip Spring-batch? I am not sure.. I got single batch job.. maybe spring-batch is overkill ? – rayman Dec 31 '14 at 16:13
  • You can perfectly monitor a single step that processes things in chunks. Advantage is that it isn't an all or nothing solution and is probably more efficient. Parsing a large xml file into thousands of objects to pass them on later is quite a memory hog and can (and probably will) lead to performance issues. As @MichaelMinella suggested you could also use Spring Integration however that would lead to processing the whole xml at once (if I'm not mistaken). – M. Deinum Dec 31 '14 at 16:19
  • Even if process an xml. I still need to convert it into objects(validations,transformations, etc..) and than send on by one.. So I cant see how I am going to avoid that anyway? – rayman Dec 31 '14 at 17:31
  • Sounds like a combination of Spring Integration and Spring Batch would be the best approach. SI to download the file and kick off the job, SB to parse the XML file and send the messages via the JmsItemWriter/AmqItemWriter/etc. – Michael Minella Dec 31 '14 at 18:57
  • @Michael, thanks for your response. I payed attention this question topic doesn't fit my questions. So I opened new question thread. please look at it: http://stackoverflow.com/questions/27729750/best-approach-using-spring-batch-to-process-big-file – rayman Jan 01 '15 at 09:54

1 Answers1

2

I can see at least two options that are both viable.

Option 1: setup the job as one step You can setup your job to contain one step where the reader simply reads the input from your URL and the writer posts to your queue.

Option 2: setup the job as two steps with intermediate storage However, you may want to divide the job in two steps to be able to re-run a step if it fails and simplify debugging etc. In that cas, the following approach may work out for you:

  • Step 1: Create a step with a FlatFileItemReader or similar is used to download the file. The step can then configure a FlatFileItemWriter to move the contents to disk.
  • Step 2: Open the file produced by the ItemWriter from the previous step. One alternative is to use the org.springframework.batch.item.xml.StaxEventItemReader together with a Jaxb2Marshaller to handle the processing (as described in this blog). Configure the output step to post messages to a queue by using e.g. org.springframework.batch.item.jms.JmsItemWriter. The writer is (as always) chunked so multiple messages can be posted at for each write.

Personally, I would probably setup the whole thing as Option 2. I find simple steps without too much transformations are easier to follow and also easier to test but that is just a matter of taste.

wassgren
  • 18,651
  • 6
  • 63
  • 77