AmazonS3 connection management

Question

Is there a recommended way to manage the connection to AmazonS3 when working with AWS?

Typical Amazon S3 code(taken from Amazon official sample) looks usually like this?

AmazonS3 s3 = new AmazonS3Client(...);
...
s3.putObject(new PutObjectRequest(bucketName, project.getName() + "/" + imageFile.getName(), imageFile));

Following are the questions:

Is this a good idea to maintain a single AmazonS3Client used by everyone in the code or is it better to create one on every call?
Is there a concept of connection pool like when working with MySQL for example?
Are questions like disconnection(MySQL analogy: MySQL was restarted) relevant such that the AmazonS3Client would become invalid and require re-creation? What would be the right way to handle a disconnection if so?
Does anyone know what features are provided by the spring integration with aws at:https://github.com/spring-projects/spring-integration-extensions/tree/master/spring-integration-aws

Thx.

hey, did you solved it with singletone ? – 2Big2BeSmall Dec 19 '15 at 14:23 — 2Big2BeSmall, Dec 19 '15 at 14:23

score 25 · Accepted Answer · answered May 04 '14 at 14:28

I'll repeat the questions to be clear:

Is this a good idea to maintain a single AmazonS3Client used by everyone in the code or is it better to create one on every call?

All client classes in the Java SDK are thread safe, so usually it is a better idea to re-use a single client than instantiating new ones. Or a few, if you are operating concurrently on multiple regions or credentials.

Is there a concept of connection pool like when working with MySQL for example?

Yes, there is connection management in the client, specially if you use the TransferManager class instead of the AmazonS3Client directly.

see: http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html

Are questions like disconnection(MySQL analogy: MySQL was restarted) relevant such that the AmazonS3Client would become invalid and require re-creation? What would be the right way to handle a disconnection if so?

By default, the client does retries with exponential backoff for recoverable errors. If it really fails/disconnects, you need to handle the exception as appropriate for your app. see: http://docs.aws.amazon.com/general/latest/gr/api-retries.html

Does anyone kwow what fearures are provided by the spring integration with aws at: https://github.com/spring-projects/spring-integration-extensions/tree/master/spring-integration-aws

It provide declarative instantiation, injection and utility classes for easier integration into Spring projects, in a similar way there are helpers for JDBC, JMS, etc...

For more AWS SDK tips and tricks, see: http://aws.amazon.com/articles/3604?_encoding=UTF8&jiveRedirect=1

Thx. Regarding Transfer Manager, the link says that it is appropriate for large size content which is not exactly my case since i a dealing with fairly small files. Is it always better to use Transfer manager over the normal S3client? — isaac.hazan, May 05 '14 at 04:34
Regarding the retry mechanism provided by the AWS SDK, is this synchronous? In other terms if the request is not successful will the caller be blocked while the AWS SDK retries? — isaac.hazan, May 05 '14 at 04:46
It is sync unless you use the async clients, see: https://aws.amazon.com/articles/5496117154196801 — Julio Faerman, May 05 '14 at 12:29
About the transfer manager, it is usually better anyway, but its most important feature is parallel upload, more helpful for large files. — Julio Faerman, May 05 '14 at 12:30
That would be OK in most cases, see https://forums.aws.amazon.com/thread.jspa?messageID=343124 — Julio Faerman, Dec 19 '15 at 19:41

Warren Dew · Answer 2 · 2014-05-11T17:55:19.747

There are important things to note on the following two questions:

Is this a good idea to maintain a single AmazonS3Client used by everyone in the code or is it better to create one on every call?

Create just one. The AmazonS3Client has a misfeature that when garbage collected, it cleans up resources that are shared by other AmazonS3Client instances, causing those instances to become invalid, even if those other instances are in the middle of handling an upload or download. We had this problem when we were creating an AmazonS3Client for each request. Amazon apparently does not consider this to be a bug. This misfeature can be avoided by creating just one AmazonS3Client, keeping it around for the life of the application, and using it in all threads in your code.

Are questions like disconnection(MySQL analogy: MySQL was restarted) relevant such that the AmazonS3Client would become invalid and require re-creation? What would be the right way to handle a disconnection if so?

Uploads and downloads can fail, but they will not invalidate the AmazonS3Client, which can still be used. The right way to handle a disconnection that is not successfully retried by the AmazonS3Client is to retry yourself or report the failure, as appropriate for your application, and to continue to use the AmazonS3Client for any additional S3 interactions you need to do.

Thx. Regarding the AmazonS3Client garbage collection process, this very interesting. Do you have a link/ref that document it? — isaac.hazan, May 11 '14 at 07:27
Unfortunately I cannot find a link now. We had this problem in our system when we created an AmazonS3Client for each use, and found information in various places that others also had this problem. I think I got the information by googling our specific stack trace. The failures always happened in a client that was being used when another client went out of scope and was presumably garbage collected. One of the pages I found was a recommendation from Amazon to create just one client and use it in all threads. We changed our code to follow that recommendation and the errors went away. — Warren Dew, May 11 '14 at 17:52

AmazonS3 connection management

2 Answers2

Linked