0

My XML file looks like below,

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

This is a fixed length flat xml file, and I need to parse this file as For ex,

ABC123411/10/20

as create Content object.

public class Content {
   private id;
   private name;
   private date;
 
   // getters
}

Ex:

name: ABC
id: 1234
Date: 11/10/20

This is what I'm trying

<bean id="reader" class="org.springframework.batch.item.xml.StaxEventItemReader" scope="step">
    <property name="resource" value="file:#{jobExecutionContext['source.download.filePath']}" />
    <property name="unmarshaller" ref="jaxb2Marshaller" />
    <property name="fragmentRootElementNames"  value="File">
    </property>
</bean>

<bean id="jaxb2Marshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
    <property name="packagesToScan">
        <list>
            <value>com.test.model</value>
        </list>
    </property>
</bean>

and my pojo,

@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "File", namespace = "//namespace")
public class TestRecord {

   @XmlValue
   private String data;

   public String getData() {
     return data;
}

}

Now this code parses the xml file and sets the value as String in TestRecord.data as below

ABC123411/10/20
XBC128911/10/20
BCD456711/23/22

With this method, we need to write a mapper again to parse this string (from TestRecord.data) by new line and then tokenize each String and assign to Content object.

I just want to check if this is something we can do it in XML configuration using readers available or any other better options? thanks!

CuriousToLearn
  • 153
  • 1
  • 12

2 Answers2

1

I would keep it simple and create a tasklet that transforms this:

<?xml version="1.0" encoding="UTF-8"?>
<File fileId="123" xmlns="abc:XYZ" > ABC123411/10/20
XBC128911/10/20
BCD456711/23/22
</File>

into this:

ABC123411/10/20
XBC128911/10/20
BCD456711/23/22

and then create a chunk-oriented step with a FlatFileItemReader to parse the new file. This would be simpler than trying to find a way to ignore lines, use regular expressions to parse the content, etc.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
0

I successfully extracted the contents using RegexLineTokenizer instead of FixedLengthTokenizer setting strict to false prevents it from choking on lines that do not match the pattern, but it will create objects with empty properties for them.

   @Bean
   public static RegexLineTokenizer regexpTokenizer() {
     RegexLineTokenizer tok = new RegexLineTokenizer();
     tok.setRegex("([A-Za-z]{3})(\\d{4})(\\d{2}/\\d{2}/\\d{2})");
     tok.setNames("name","id","date" );
     tok.setStrict(false);
     return tok;
   }

Here is what that translates to as an XML configuration:

<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="/file path" />
<property name="linesToSkip" value="2" />
<property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
        <property name="lineTokenizer">
            <bean class="org.springframework.batch.item.file.transform.RegexLineTokenizer">
                <property name="names"
                          value="name,id,date"/>
                <property name="regex"
                          value="([A-Za-z]{3})(\d{4})(\d{2}/\d{2}/\d{2})"/>
                <property name="strict" value="false"/>
            </bean>
        </property>
        <property name="fieldSetMapper">
            <!-- Parse the object -->
            <bean class="org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper">
                <property name="prototypeBeanName" value="testRecord" />
        </property>
    </bean>
</property>
pete_bc
  • 76
  • 4
  • I cannot modify the file. – CuriousToLearn Nov 01 '22 at 16:58
  • Your question was: "How to ignore the first two and last line of the file, i.e xml tags?" The answer is to treat `<` as a comment prefix. – pete_bc Nov 01 '22 at 17:07
  • can we configure that in the spring context (posted above), can you please share how.? also if you see the elements do not have EOL so records keep appended like below instead in new line. ABC123411/10/20 XBC128911/10/20 BCD456711/23/22 – CuriousToLearn Nov 01 '22 at 19:02
  • I edited the answer. Evidently the comments property does not apply to lines after the first one. Maybe you can't modify the file, but you should say mean things to the party responsible for creating it. :) – pete_bc Nov 02 '22 at 14:49
  • so this code ignores all the xml elements and read only the contents of FILE element? How it is different from FixedLengthTokenizer? Can you please make me understand. I see its just the way it parses the fixed length data but the problem for ignore other elements still exists isnt it? – CuriousToLearn Nov 02 '22 at 17:27
  • FixedLengthTokenizer converts each line of text to a list of property values according to a matching sequence of line positions ranges. RegexLineTokenizer converts each line to a list of properties according to groups in the indicated regular expression. If strict is set to false, it will create objects with empty values for lines that do not match the regular expression. It's really a hack. I think @Mahmoud Ben Hassine has a better idea. – pete_bc Nov 04 '22 at 00:31