0

I'm using the Jaxb2Marshaller from org.springframework.oxm.jaxb.Jaxb2Marshaller in my Spring Batch application to marshall XML with annotated classes. The implementation of the Marshaller is:

@Bean
public Jaxb2Marshaller productMarshaller() {
        
    Map<String, Object> props = new HashMap<String, Object>();
    props.put("com.sun.xml.bind.marshaller.CharacterEscapeHandler", new XmlCharacterEscapeHandler());
        
    Jaxb2Marshaller marshaller = new Jaxb2Marshaller();
    marshaller.setClassesToBeBound(new Class[] {Product.class, TechSpecs.class});
    marshaller.setMarshallerProperties(props);
    return marshaller;
}

The Marshaller is used inside a StaxEventItemWriter that is implemented as following:

@Bean(name = "writer")
@StepScope
public StaxEventItemWriter<Product> writer (
        @Value("#{jobParameters['path']}") String path,
        @Value("#{stepExecutionContext['currentFile']}") String fileName
    ) {
        
    Map<String, String> rootElementAttributes = new HashMap<String, String>();
    rootElementAttributes.put("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance");

    FileSystemResource file = new FileSystemResource(path + fileName);
        
    return new StaxEventItemWriterBuilder<Product>()
            .name("writer")
            .version("1.0")
            .encoding("UTF-8")
            .standalone(false)
            .rootTagName("Products")
            .rootElementAttributes(rootElementAttributes)
            .headerCallback(headerCallback(null, null))
            .footerCallback(footerCallback())
            .marshaller(productMarshaller())
            .resource(file)
            .build();
}

Now the problem is that when I run the code, I get an IndexOutOfBoundsException. I found out that the exception is thrown because my Product object has a String attribute that may contain a &. The & is not allowed in XML and has to be escaped.

Why is the Jaxb2Marshaller not auto escaping the & character? As far as I understand the Marshaller should take care of escaping characters.

I tried to escape the character my self in the item processor with the StringEscapeUtils, e.g. product.setFullName(StringEscapeUtils.escapeXml10(dbExport.getFullName()));, but this didn't help. Also the String will be changed from & to &amp;, which also contains a &.

I also tried to use my own implementation of a CharacterEscapeHandler, but the marshaller.setMarshallerProperties() does not have any visible effect on the Marshaller. Do I have to set the properties for the Marshaller differently?

public class XmlCharacterEscapeHandler implements CharacterEscapeHandler {

    @Override
    public void escape(char[] ch, int start, int length, boolean isAttVal, Writer out) throws IOException { 
        StringWriter buffer = new StringWriter();   
        for(int i = start; i < start + length; i++) {
            buffer.write(ch[i]);
        }
        String escapedString = StringEscapeUtils.escapeXml10(buffer.toString());
        out.write(escapedString);
    }

}

EDIT

Unfortunately I could not resolve my issue until now. Therefore, I switched from Jaxb2Marshaller to XStreamMarshaller. Here I get a similar issue. As far as I can tell the underlying XStream should use a PrettyPrintWriter that will auto convert & to &amp; as described here: https://stackoverflow.com/a/48141964/4191735 This is not happening. For me there is always an problem with &. Why does the escaping not work? Also escaping the String itself and force converting it to UTF-8 does not help.

Minimal Complete Example Main:

package com.mwe;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.ComponentScan;

@SpringBootApplication
@ComponentScan("com.mwe")
public class Main {
    
    public static void main(String [] args) {
        System.exit(SpringApplication.exit(SpringApplication.run(Main.class, args)));
    }
}

BatchConfig:

package com.mwe;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.xml.StaxEventItemWriter;
import org.springframework.batch.item.xml.builder.StaxEventItemWriterBuilder;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;
import org.springframework.oxm.xstream.XStreamMarshaller;


@Configuration
@EnableBatchProcessing
public class BatchConfig {
    
    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    @Autowired
    public StepBuilderFactory stepBuilderFactory;
    
    @Bean
    @StepScope
    public FlatFileItemReader<Product> reader() {
            
        FlatFileItemReader<Product> reader = new FlatFileItemReader<Product>();
        reader.setResource(new FileSystemResource("test.csv"));

        DefaultLineMapper<Product> lineMapper = new DefaultLineMapper<>();
        lineMapper.setFieldSetMapper(new CustomFieldMapper());

        DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer();
        tokenizer.setDelimiter("|");
        tokenizer.setNames(new String[] {"ID", "NAME"});
        lineMapper.setLineTokenizer(tokenizer);

        reader.setLineMapper(lineMapper);
        reader.setLinesToSkip(1);

        return reader;
    }

    @Bean
    public ItemProcessor<Product, Xml> processor() {
        return new Processor();
    }

    @Bean
    @StepScope
    public StaxEventItemWriter<Xml> writer () {
        
        return new StaxEventItemWriterBuilder<Xml>()
                .name("writer")
                .version("1.0")
                .encoding("UTF-8")
                .standalone(false)
                .rootTagName("products")
                .marshaller(getMarshaller())
                .resource(new FileSystemResource("test.xml"))
                .build();
    }
    
    @Bean
    public Job job() {
        return this.jobBuilderFactory.get("job")
                .start(step1())
                .build();
    }

    @Bean
    public Step step1() {
        return (stepBuilderFactory.get("step1")
                .<Product, Xml>chunk(2)
                .reader(reader())
                .processor(processor())
                .writer(writer())
                .build());
    }
    
    @Bean
    public XStreamMarshaller getMarshaller() {
        
        XStreamMarshaller marshaller = new XStreamMarshaller();
        marshaller.setEncoding("UTF-8");
        return marshaller;
    }
}

CustomFieldMapper

package com.mwe;

import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;


public class CustomFieldMapper implements FieldSetMapper<Product> {

    public Product mapFieldSet(FieldSet fs) {
        
        Product product = new Product();
        product.setId(fs.readString("ID"));
        product.setName(fs.readString("NAME"));
        
        return product;
    }
    
}

ItemProcessor:

package com.mwe;

import org.springframework.batch.item.ItemProcessor;

public class Processor implements ItemProcessor<Product, Xml> {
    
    @Override
    public Xml process(final Product product) {
        
        Xml xml = new Xml();
        xml.setId(Integer.parseInt(product.getId()));
        xml.setName(product.getName());
        
        return xml;
    }

}

Product:

package com.mwe;

public class Product {
    
    private String id;
    
    private String name;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
    
}

Xml:

package com.mwe;

public class Xml {

    private int id;
    
    private String name;

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
    
}

Application Properties:

# Spring config
spring.main.allow-bean-definition-overriding=true
spring.main.banner-mode=off
spring.batch.initialize-schema=never

# Logging data source
spring.datasource.logging.driver-class-name=org.mariadb.jdbc.Driver
spring.datasource.logging.maximum-pool-size=10
spring.datasource.logging.hikar.minimum-idle=1
spring.datasource.logging.hikari.data-source-properties.useUnicode=true
spring.datasource.logging.hikari.data-source-properties.characterEncoding=UTF-8
spring.datasource.logging.hibernate.dialect=org.hibernate.dialect.MariaDBDialect
spring.datasource.logging.hibernate.ddl-auto=none

spring.datasource.url=jdbc:mariadb://localhost:3306/logging?UseUnicode=true&amp;characterEncoding=utf8
spring.datasource.username=root
spring.datasource.password=root

Pom:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.4.2</version>
        <relativePath /> <!-- lookup parent from repository -->
    </parent>

    <groupId>com.mwe</groupId>
    <artifactId>mwe</artifactId>
    <version>1</version>
    <name>mwe</name>
    <description>Minimal working example</description>

    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-jdbc</artifactId>
        </dependency>
        <dependency>
            <groupId>org.mariadb.jdbc</groupId>
            <artifactId>mariadb-java-client</artifactId>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-oxm</artifactId>
        </dependency>
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-text</artifactId>
            <version>1.9</version>
        </dependency>
        <dependency>
            <groupId>javax.activation</groupId>
            <artifactId>activation</artifactId>
            <version>1.1.1</version>
        </dependency>
        <dependency>
            <groupId>com.thoughtworks.xstream</groupId>
            <artifactId>xstream</artifactId>
            <version>1.4.15</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

test.csv:

"ID"|"NAME"
1|"Product 1"
2|"Product 1 & Addition"
Christoph
  • 117
  • 3
  • 11

1 Answers1

1

You are hitting this issue: https://github.com/spring-projects/spring-batch/issues/3745.

This issue will be fixed in v4.3.2/v4.2.6 which are planned to be released on March 18, 2021. Please check the milestone page on GitHub in case the release date changes.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Thank for your answer. Are there any workarounds or is it better to remove the characters that have to be escaped from my data until the release on March 18th? So I can continue with the development and then update later on. – Christoph Mar 09 '21 at 09:47
  • The issue actually happens only when the `StaxEventItemWriter` is set to be transactional (which is `true` by default). Based on your minimal example, you are transforming a csv file to an xml file, which does not require the writer to be transactional. I added `.transactional(false)` on your item writer and the exception did not happen. Do you confirm? – Mahmoud Ben Hassine Mar 09 '21 at 10:19
  • Yes setting `transactional(false)` resolves the issue for the given example. In my main application I'm using a `JdbcItemReader` so I will wait for the release. – Christoph Mar 09 '21 at 10:35
  • `JdbcCursorItemReader` will not participate in any transactions created as part of the step processing (see its [Javadoc](https://docs.spring.io/spring-batch/docs/4.3.x/api/org/springframework/batch/item/database/JdbcCursorItemReader.html)). So it should be fine to disable that flag on the writer without impacting the reader. – Mahmoud Ben Hassine Mar 09 '21 at 10:44
  • That said, are you willing to help testing the fix for this issue since you have a project ready for that? We have a PR with a fix here: https://github.com/spring-projects/spring-batch/pull/3843. All you need is apply the patch on the 4.3.x branch, build and test against the snapshot version. That would be a great contribution! – Mahmoud Ben Hassine Mar 09 '21 at 10:45
  • Sure, do I have to report my result anywhere? – Christoph Mar 09 '21 at 12:37
  • Oh great thank you! Yes, you can add a comment on that PR. – Mahmoud Ben Hassine Mar 09 '21 at 12:51