Hi I am new to spring batch, I want to create multiple files(csv) per chunk processed. FileName will be something like timestamp.csv. Any idea how can I do that? Basically it is splitting one big file to smaller files.
Thank you!
Hi I am new to spring batch, I want to create multiple files(csv) per chunk processed. FileName will be something like timestamp.csv. Any idea how can I do that? Basically it is splitting one big file to smaller files.
Thank you!
CSV files are basically text files with a new line character in the end.
So as far as splitting a big CSV file into smaller files is concerned, you simply need to read big file line by line in Java & when your read line count reaches threshold count / max count per small file ( 10, 100 , 1000 etc ) , you create a new file with naming convention as per your need & dump data there.
How to read a large text file line by line using Java?
BufferedReader
is the main class to read a text file line by line.
And implementing this logic has nothing to do with Spring Batch but can be in Java or using OS level commands.
So you have two distinct logical pieces , reading the the big file line by line & creating csv ...you can develop these two pieces as separate components & plug it into Spring Batch Framework at appropriate place as per your business requirement.
There is a java library to deal with CSV files easily & you might like to use it - depending on complexity involved .
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>4.6</version>
</dependency>
I would use a command line utility like the split
command (or equivalent) or try to do it with plain Java (See Java - Read file and split into multiple files).
But if you really want to do it with Spring Batch, then you can use something like:
import java.time.LocalDateTime;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ExecutionContext;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.PassThroughLineMapper;
import org.springframework.batch.item.file.transform.PassThroughLineAggregator;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;
@Configuration
@EnableBatchProcessing
public class MyJob {
private final JobBuilderFactory jobBuilderFactory;
private final StepBuilderFactory stepBuilderFactory;
public MyJob(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
this.jobBuilderFactory = jobBuilderFactory;
this.stepBuilderFactory = stepBuilderFactory;
}
@Bean
public FlatFileItemReader<String> itemReader() {
return new FlatFileItemReaderBuilder<String>()
.name("flatFileReader")
.resource(new FileSystemResource("foos.txt"))
.lineMapper(new PassThroughLineMapper())
.build();
}
@Bean
public ItemWriter<String> itemWriter() {
final FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
writer.setLineAggregator(new PassThroughLineAggregator<>());
writer.setName("chunkFileItemWriter");
return items -> {
writer.setResource(new FileSystemResource("foos" + getTimestamp() + ".txt"));
writer.open(new ExecutionContext());
writer.write(items);
writer.close();
};
}
private String getTimestamp() {
// TODO tested on unix/linux systems, update as needed to not contain illegal characters for a file name on MS windows
return LocalDateTime.now().toString();
}
@Bean
public Step step() {
return stepBuilderFactory.get("step")
.<String, String>chunk(3)
.reader(itemReader())
.writer(itemWriter())
.build();
}
@Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(step())
.build();
}
public static void main(String[] args) throws Exception {
ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
JobLauncher jobLauncher = context.getBean(JobLauncher.class);
Job job = context.getBean(Job.class);
jobLauncher.run(job, new JobParameters());
}
}
The file foos.txt
is the following:
foo1
foo2
foo3
foo4
foo5
foo6
The example will write each chunk in a separate file with a timestamp:
File1 foos2019-11-28T09:23:47.769.txt
:
foo1
foo2
foo3
File2 foos2019-11-28T09:23:47.779.txt
:
foo4
foo5
foo6
I think it's better to use a number instead of a timestamp BTW.
NB: I would not care much about restartability for such a use case.
Use Partitioner
in spring batch for implementation details please check
and check the API documentation here