0

I'm new in Spring Batch, we know that CSV files come in all form and shapes… and some of them are syntactically incorrect. I'm tring to parse a CSV file, that line start with '"' and end with '"'this is my CSV :

"1;Paris;13/4/1992;16/7/2006"
"2;Lyon;31/5/1993;1/8/2009"
"3;Metz;21/4/1990;27/4/2010"

I tried this :

  <bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="data-1.txt" />
    <property name="lineMapper">
      <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
        <property name="fieldSetMapper">
          <!-- Mapper which maps each individual items in a record to properties in POJO -->
          <bean class="com.sam.fourthTp.MyFieldSetMapper" />
        </property>
        <property name="lineTokenizer">
          <!-- A tokenizer class to be used when items in input record are separated by specific characters -->
          <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
            <property name="quoteCharacter" value="&quot;" />
            <property name="delimiter" value=";" />
          </bean>
        </property>
      </bean>
    </property>
  </bean>

But this work when a CSV file be like this :

"1";"Paris";"13/4/1992";"16/7/2006"
"2;"Lyon";"31/5/1993";"1/8/2009"
"3";"Metz";"21/4/1990";"27/4/2010"

My question is how I can parse my CSV when a line start with '"' and end with '"' ??!

abdallah
  • 81
  • 1
  • 1
  • 10

1 Answers1

1

The quoteCharacter is as you mentioned applicable to fields, not records.

My question is how I can parse my CSV when a line start with '"' and end with '"' ??!

What you can do is:

  • Read lines as raw Strings
  • Use a composite item processor with two delegates: One that trims the " from the start/end of each record, and another one that parses the line and map it to your domain object

Here is a quick example:

import java.util.Arrays;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.batch.item.support.CompositeItemProcessor;
import org.springframework.batch.item.support.ListItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@EnableBatchProcessing
public class MyJob {

    @Autowired
    private JobBuilderFactory jobs;

    @Autowired
    private StepBuilderFactory steps;

    @Bean
    public ItemReader<String> itemReader() {
        return new ListItemReader<>(Arrays.asList(
                "\"1;Paris;13/4/1992;16/7/2006\"",
                "\"2;Lyon;31/5/1993;1/8/2009\"",
                "\"3;Metz;21/4/1990;27/4/2010\"",
                "\"4;Lille;21/4/1980;27/4/2011\""
                ));
    }

    @Bean
    public ItemProcessor<String, String> itemProcessor1() {
        return item -> item.substring(1, item.length() - 1);
    }

    @Bean
    public ItemProcessor<String, Record> itemProcessor2() {
        DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
        lineTokenizer.setNames("id", "ville");
        lineTokenizer.setDelimiter(";");
        lineTokenizer.setStrict(false);
        BeanWrapperFieldSetMapper<Record> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
        fieldSetMapper.setTargetType(Record.class);
        return item -> {
            FieldSet tokens = lineTokenizer.tokenize(item);
            return fieldSetMapper.mapFieldSet(tokens);
        };
    }

    @Bean
    public ItemWriter<Record> itemWriter() {
        return items -> {
            for (Record item : items) {
                System.out.println(item);
            }
        };
    }

    @Bean
    public CompositeItemProcessor<String, Record> compositeItemProcessor() {
        CompositeItemProcessor<String, Record> compositeItemProcessor = new CompositeItemProcessor<>();
        compositeItemProcessor.setDelegates(Arrays.asList(itemProcessor1(), itemProcessor2()));
        return compositeItemProcessor;
    }

    @Bean
    public Step step() {
        return steps.get("step")
                .<String, Record>chunk(2)
                .reader(itemReader())
                .processor(compositeItemProcessor())
                .writer(itemWriter())
                .build();
    }

    @Bean
    public Job job() {
        return jobs.get("job")
                .start(step())
                .build();
    }

    public static class Record {

        private int id;
        private String ville;

        public Record() {
        }

        public int getId() {
            return id;
        }

        public void setId(int id) {
            this.id = id;
        }

        public String getVille() {
            return ville;
        }

        public void setVille(String ville) {
            this.ville = ville;
        }

        @Override
        public String toString() {
            return "Record{" +
                    "id=" + id +
                    ", ville='" + ville + '\'' +
                    '}';
        }
    }

    public static void main(String[] args) throws Exception {
        ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
        JobLauncher jobLauncher = context.getBean(JobLauncher.class);
        Job job = context.getBean(Job.class);
        jobLauncher.run(job, new JobParameters());
    }

}

I used a simple POJO called Record and mapped only two fields. This sample prints:

Record{id=1, ville='Paris'}
Record{id=2, ville='Lyon'}
Record{id=3, ville='Metz'}
Record{id=4, ville='Lille'}

Hope this helps.

Mahmoud Ben Hassine
  • 28,519
  • 3
  • 32
  • 50
  • Hey Mahmoud :), thanks for your reply I tried your code and I got this : `@Bean method ScopeConfiguration.jobScope is non-static and returns an object assignable to Spring's BeanFactoryPostProcessor interface. This will result in a failure to process annotations such as @Autowired, @Resource and @PostConstruct within the method's declaring @Configuration class. Add the 'static' modifier to this method to avoid these container lifecycle issues; see @Bean javadoc for complete details.` – abdallah Apr 18 '19 at 08:13
  • 1
    That's a warning which has been fixed in Spring Batch 3.0.9/4.0.1. See [BATCH-2161](https://jira.spring.io/browse/BATCH-2161). – Mahmoud Ben Hassine Apr 18 '19 at 09:02
  • Thnaks again :D, but now I got this :/ :`Exception in thread "main" org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'myJob': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private org.springframework.batch.core.configuration.annotation.JobBuilderFactory com.drihem.sam.fourthTp.MyJob.jobs; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'jobBuilders' ` – abdallah Apr 18 '19 at 09:25
  • 1
    which spring batch version do you use? – Mahmoud Ben Hassine Apr 18 '19 at 10:00
  • @Mohmoud thanks again for your reply I use : `spring-batch-core 4.0.1.RELEASE` – abdallah Apr 18 '19 at 10:03
  • what about the date ? `"1;Paris;13/4/1992;16/7/2006"` – abdallah Apr 18 '19 at 10:38
  • to get dates I added this method : `public User mapFieldSet(FieldSet fieldSet) throws BindException { User result = new User(); result.setId(fieldSet.readInt(0)); result.setVille(fieldSet.readRawString(1)); result.setDateOfBirth(new LocalDate(fieldSet.readDate(2,"dd/MM/yyyy"))); result.setDateFirstDec(new LocalDate(fieldSet.readDate(2,"dd/MM/yyyy"))); return result; }` thanks again ! – abdallah Apr 18 '19 at 13:10
  • in your code you use an `Arrays` my question is how can I user a file .CSV ? – abdallah Apr 18 '19 at 14:58
  • That's just an example as I can't upload a csv file to the answer. You can change the reader with a `FlatFileItemReader` and it should work in the same way. The most important point is about the composite item processor. – Mahmoud Ben Hassine Apr 18 '19 at 16:10
  • Hey Mahmoud thanks again; I make it as a question can you explain this [here](https://stackoverflow.com/questions/55749966/spring-batch-reading-a-csv-file-into-array-list) please ? – abdallah Apr 18 '19 at 16:14
  • if I user `FlatFileItemReader` my `@Bean` will return reader of type String ? i need more details please – abdallah Apr 19 '19 at 07:49
  • 1
    Yes, it should return a String so that the first processor can trim the leading/trailing `"` characters. I answered your other question https://stackoverflow.com/questions/55749966/spring-batch-reading-a-csv-file-into-array-list with an example. – Mahmoud Ben Hassine Apr 19 '19 at 07:53
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/192112/discussion-between-abdallah-and-mahmoud-ben-hassine). – abdallah Apr 19 '19 at 15:09
  • Hey @MahmoudBenHassine , I want to create a class of my ItemProcessor `public class MyItemProcessorTwo implements ItemProcessor { @Override public User process(String item) throws Exception {DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer() lineTokenizer.setNames("id", "ville", "dateOfBirth", "dateFirstDec"); lineTokenizer.setDelimiter(";"); return item -> { FieldSet tokens = lineTokenizer.tokenize(item); return mapFieldSet(tokens); };} }` but I got this `The target type of this expression must be a functional interface` in `item ->` – abdallah Apr 26 '19 at 09:58