10

I am trying to read files from AWS S3 and process it with Spring Batch:

Can a Spring Itemreader process this Task? If so, How do I pass the credentials to S3 client and config my spring xml to read a file or multiple files

<bean id="itemReader" class=""org.springframework.batch.item.file.FlatFileItemReader"">
    <property name="resource" value=""${aws.file.name}"" />
    </bean>
Amit Verma
  • 40,709
  • 21
  • 93
  • 115
sve
  • 393
  • 1
  • 2
  • 15
  • I haven't tried it, but Spring Cloud AWS adds the `Resource` abstraction for S3. You may want to have a look at that: http://cloud.spring.io/spring-cloud-aws/spring-cloud-aws.html#_resource_handling – Michael Minella Jun 15 '15 at 15:34
  • Thanks.Ill take a look at it. So do i create a custom item reader using the mentioned resource? any ideas for batch processing? – sve Jun 15 '15 at 23:52
  • No. I'd expect you to be able to use the 'FlatFileItemReader' but as I said, I haven't tried this myself. – Michael Minella Jun 17 '15 at 03:48
  • Did you get this working please, @SpringStarter? – John Jul 20 '15 at 20:33
  • I got a FlatFileItemReader working with Spring Cloud AWS here: https://stackoverflow.com/questions/31984393/spring-batch-process-an-encoded-zipped-file/54796827#54796827 – Groppe Feb 20 '19 at 23:30

3 Answers3

12

Update To use the Spring-cloud-AWS you would still use the FlatFileItemReader but now you don't need to make a custom extended Resource.

Instead you set up a aws-context and give it your S3Client bean.

    <aws-context:context-resource-loader amazon-s3="amazonS3Client"/>

The reader would be set up like any other reader - the only thing that's unique here is that you would now autowire your ResourceLoader

@Autowired
private ResourceLoader resourceLoader;

and then set that resourceloader:

@Bean
public FlatFileItemReader<Map<String, Object>> AwsItemReader() {
    FlatFileItemReader<Map<String, Object>> reader = new FlatFileItemReader<>();
    reader.setLineMapper(new JsonLineMapper());
    reader.setRecordSeparatorPolicy(new JsonRecordSeparatorPolicy());
    reader.setResource(resourceLoader.getResource("s3://" + amazonS3Bucket + "/" + file));
    return reader;
}

I would use the FlatFileItemReader and the customization that needs to take place is making your own S3 Resource object. Extend Spring's AbstractResource to create your own AWS resource that contains the AmazonS3 Client, bucket and file path info etc..

For the getInputStream use the Java SDK:

        S3Object object = s3Client.getObject(new GetObjectRequest(bucket, awsFilePath));
        return object.getObjectContent();

Then for contentLength -

return s3Client.getObjectMetadata(bucket, awsFilePath).getContentLength();

and lastModified use

.getLastModified().getTime();

The Resource you make will have the AmazonS3Client which contains all the info your spring-batch app needs to communicate with S3. Here's what it could look like with Java config.

    reader.setResource(new AmazonS3Resource(amazonS3Client, amazonS3Bucket, inputFile));
mtoutcalt
  • 121
  • 7
  • An alternative to creating your own Resource is to use the Spring Cloud AWS API - http://cloud.spring.io/spring-cloud-aws/spring-cloud-aws.html#_resource_handling – mtoutcalt Jul 22 '15 at 15:34
  • This is exactly what i have done after some r and d. This is helpful and works like a charm. Based on your comment for using spring cloud AWS API , how would i pass the data recieved to a linemapper /tokenizer in Spring Batch? – sve Jul 22 '15 at 22:55
  • There should be nothing unique about reading the data that you receive. With aws-context set with the s3client and giving the reader the resourceloader your Reader will read the items the same way it would if you have a local file you were reading. – mtoutcalt Jul 24 '15 at 14:35
1

Another way to read from S3 through FlatFileItemReader is to set Resouce as InputStream Resouce and then use s3client putobject to upload the Stream.

reader.setResource(new InputStreamResouce(inputstream));

Once the stream is populated,

s3client.putObject(bucketname,key,inputstream,metadata);
bumi25
  • 125
  • 1
  • 14
1

More simple steps are:

  1. Create AWSS3 client bean.
  2. Create ResourceLoader bean.
  3. Use ResourceLoader to set S3 resources.

Firstly, you need to create AWSS3 client and ResourceLoader bean in your aws configuration file, like this.

@Configuration
@EnableContextResourceLoader
public class AWSConfiguration {

@Bean
@Primary
public AmazonS3 getAmazonS3Cient() {

    ClientConfiguration config = new ClientConfiguration();
    
    config.setConnectionTimeout(5000 * 10);
    config.setSocketTimeout(5000 * 10);

    return AmazonS3ClientBuilder.standard()
            .withClientConfiguration(config).build();
}


@Bean
@Autowired
public static ResourceLoaderBeanPostProcessor resourceLoaderBeanPostProcessor(
        AmazonS3 amazonS3EncryptionClient) {
    return new ResourceLoaderBeanPostProcessor(amazonS3EncryptionClient);
}

}

Then use resourceloader bean in ItemReader to set S3 resources.

@Autowired
private ResourceLoader resourceLoader;

@Bean
public FlatFileItemReader<String> fileItemReader() {

FlatFileItemReader<String> reader = new FlatFileItemReader<>();
reader.setLineMapper(new JsonLineMapper()); //Change line mapper as per your need
reader.setResource(resourceLoader.getResource("s3://" + amazonS3Bucket + "/" + file));
return reader;
}
Gaurav Raghav
  • 157
  • 1
  • 6