1

I have a service that runs daily which purpose is to export delta from Postgres DB and upload it to S3 bucket.

I'm using CopyManager and the copyOut method. In the copyOut I have 2 options: a) Writer b) OutputStream

On the other hand I have Amazon S3 client which has this PutObjectRequest that accepts File or InputStream.

Currently we have 2 ways of doing this: 1. Export to file and upload from file 2. Export to ByteArrayOutputStream and get they underlying byte[] and pass this to the InputStream of S3 uploader

Is it possible to somehow connect those 2 that we wouldn't have to use this intermediary file nor a possibly very large byte array?

In other words I would like to directly upload the db delta

bodziec
  • 574
  • 1
  • 6
  • 23

1 Answers1

0

Using your existing method is not that bad, when using stream you can read stream and as soon as you have enough data you send it directly to s3.

You can also setup a AWS pipeline to extract data from a Postgres RDS instance into S3, you can check https://stackoverflow.com/a/34120407/4296747. AWS has not so good documentation on how to do it from Postgres but you will find plenty available when using mysql

Community
  • 1
  • 1
Frederic Henri
  • 51,761
  • 10
  • 113
  • 139
  • Current solution isn't bad? File may be large - we don't want to have big disk attached to our VM. Byte array may be large - we don't want much RAM on the VM. I'm currently working on a custom OutputStream that would buffer dumped data and upload to S3 using multipart upload. – bodziec Aug 26 '16 at 14:52
  • well you don't mention size of the delta. you should look at pipeline then – Frederic Henri Aug 26 '16 at 15:27
  • I've just saw that the CopyOut object allows me to fetch the exported data line by line as it comes. Then I can buffer them for example in configured MB batches and push as multipart upload. – bodziec Aug 26 '16 at 16:40