Parsing a Fixed length Flat xml file in spring batch

Question

My XML file looks like below,

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

This is a fixed length flat xml file, and I need to parse this file as For ex,

ABC123411/10/20

as create Content object.

public class Content {
   private id;
   private name;
   private date;
 
   // getters
}

Ex:

name: ABC
id: 1234
Date: 11/10/20

This is what I'm trying

<bean id="reader" class="org.springframework.batch.item.xml.StaxEventItemReader" scope="step">
    <property name="resource" value="file:#{jobExecutionContext['source.download.filePath']}" />
    <property name="unmarshaller" ref="jaxb2Marshaller" />
    <property name="fragmentRootElementNames"  value="File">
    </property>
</bean>

<bean id="jaxb2Marshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="packagesToScan">
        <list>
            <value>com.test.model</value>
        </list>
    </property>
</bean>

and my pojo,

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "File", namespace = "//namespace")
public class TestRecord {

   @XmlValue
   private String data;

   public String getData() {
     return data;
}

}

Now this code parses the xml file and sets the value as String in TestRecord.data as below

ABC123411/10/20
XBC128911/10/20
BCD456711/23/22

With this method, we need to write a mapper again to parse this string (from TestRecord.data) by new line and then tokenize each String and assign to Content object.

I just want to check if this is something we can do it in XML configuration using readers available or any other better options? thanks!

`` - That doesn't look correct. Why do you specify the `value=` attribute twice? — Jim Garrison, Nov 01 '22 at 00:32

score 1 · Answer 1 · answered Nov 02 '22 at 05:09

1

I would keep it simple and create a tasklet that transforms this:

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

into this:

ABC123411/10/20
XBC128911/10/20
BCD456711/23/22

and then create a chunk-oriented step with a FlatFileItemReader to parse the new file. This would be simpler than trying to find a way to ignore lines, use regular expressions to parse the content, etc.

answered Nov 02 '22 at 05:09

Mahmoud Ben Hassine

28,519
3
32
50

I cannot modify the file or create new file. – CuriousToLearn Nov 02 '22 at 17:14
what prevents you from doing that? – Mahmoud Ben Hassine Nov 02 '22 at 17:46
instead of having it in a new file, now I have this data as result of jaxb unmarshalling. now is there a way to tokenize and assign to a pojo ? Posted mored details in my question.. – CuriousToLearn Nov 03 '22 at 01:58
well, you just implemented what is suggested in this answer by producing TestRecord.data. Now you can use a `FlatFileItemReader` with a `DelimitedLineTokenizer` configured with a `/` as separator. – Mahmoud Ben Hassine Nov 03 '22 at 07:30
FlatFileItemReader can read TestRecord.data? can you please explain how?? It takes file path in "resource" isnt it?? – CuriousToLearn Nov 03 '22 at 18:56
...... please correct me if im wrong. pls share some reference too – CuriousToLearn Nov 03 '22 at 19:08
yes, the getting started guide shows an example: https://spring.io/guides/gs/batch-processing/. You need to change the delimiter as mentioned previously and adapt the line mapper to map data to your domain object. – Mahmoud Ben Hassine Nov 04 '22 at 06:33
can you pls explain how i can fit into this code – CuriousToLearn Nov 08 '22 at 23:08
com.test.model – CuriousToLearn Nov 08 '22 at 23:08
or any other way to do> the sample u share is straightforward csv but this is xml . – CuriousToLearn Nov 08 '22 at 23:09

pete_bc · Answer 2 · 2022-11-01T22:42:18.523

0

I successfully extracted the contents using RegexLineTokenizer instead of FixedLengthTokenizer setting strict to false prevents it from choking on lines that do not match the pattern, but it will create objects with empty properties for them.

   @Bean
   public static RegexLineTokenizer regexpTokenizer() {
     RegexLineTokenizer tok = new RegexLineTokenizer();
     tok.setRegex("([A-Za-z]{3})(\\d{4})(\\d{2}/\\d{2}/\\d{2})");
     tok.setNames("name","id","date" );
     tok.setStrict(false);
     return tok;
   }

Here is what that translates to as an XML configuration:

<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="/file path" />
<property name="linesToSkip" value="2" />
<property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
        <property name="lineTokenizer">
            <bean class="org.springframework.batch.item.file.transform.RegexLineTokenizer">
                <property name="names"
                          value="name,id,date"/>
                <property name="regex"
                          value="([A-Za-z]{3})(\d{4})(\d{2}/\d{2}/\d{2})"/>
                <property name="strict" value="false"/>
            </bean>
        </property>
        <property name="fieldSetMapper">
            <!-- Parse the object -->
            <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
                <property name="prototypeBeanName" value="testRecord" />
        </property>
    </bean>
</property>

edited Nov 01 '22 at 22:42

answered Nov 01 '22 at 16:01

pete_bc

76
4

I cannot modify the file. – CuriousToLearn Nov 01 '22 at 16:58
Your question was: "How to ignore the first two and last line of the file, i.e xml tags?" The answer is to treat `<` as a comment prefix. – pete_bc Nov 01 '22 at 17:07
can we configure that in the spring context (posted above), can you please share how.? also if you see the elements do not have EOL so records keep appended like below instead in new line. ABC123411/10/20 XBC128911/10/20 BCD456711/23/22 – CuriousToLearn Nov 01 '22 at 19:02
I edited the answer. Evidently the comments property does not apply to lines after the first one. Maybe you can't modify the file, but you should say mean things to the party responsible for creating it. :) – pete_bc Nov 02 '22 at 14:49
so this code ignores all the xml elements and read only the contents of FILE element? How it is different from FixedLengthTokenizer? Can you please make me understand. I see its just the way it parses the fixed length data but the problem for ignore other elements still exists isnt it? – CuriousToLearn Nov 02 '22 at 17:27
FixedLengthTokenizer converts each line of text to a list of property values according to a matching sequence of line positions ranges. RegexLineTokenizer converts each line to a list of properties according to groups in the indicated regular expression. If strict is set to false, it will create objects with empty values for lines that do not match the regular expression. It's really a hack. I think @Mahmoud Ben Hassine has a better idea. – pete_bc Nov 04 '22 at 00:31

Parsing a Fixed length Flat xml file in spring batch

2 Answers2