Append data to an S3 object

Question

Let's say that I have a machine that I want to be able to write to a certain log file stored on an S3 bucket.

So, the machine needs to have writing abilities to that bucket, but, I don't want it to have the ability to overwrite or delete any files in that bucket (including the one I want it to write to).

So basically, I want my machine to be able to only append data to that log file, without overriding it or downloading it.

Is there a way to configure my S3 to work like that? Maybe there's some IAM policy I can attach to it so it will work like I want?

You can't modify objects in S3. Could you just append a new log file? That would be a better model and would support multiple, simultaneous clients. — jarmod, Jan 21 '17 at 20:38
@jarmod Yeah, I thought about that, but the problem is that if an attacker succeeds in accessing my server, he'll have the ability to delete the local file stored on it, before it was sent to the S3 bucket (which let's say happens at the end of the day). — Theodore, Jan 21 '17 at 21:38
You might also want to take a look at CloudWatch logs. Let it manage the complexity of collecting and storing your logs, provide searching facilities, retention policies, and allow you to generate alerts based on metrics that you can customize for your logs. — jarmod, Jan 21 '17 at 23:15
You might also take a look at Google BigQuery. You can use it to solve your problem. — Daniel777, Apr 21 '17 at 20:25

score 194 · Accepted Answer · answered Jan 21 '17 at 20:15

194

Unfortunately, you can't.

S3 doesn't have an "append" operation.^* Once an object has been uploaded, there is no way to modify it in place; your only option is to upload a new object to replace it, which doesn't meet your requirements.

*: Yes, I know this post is a couple of years old. It's still accurate, though.

answered Jan 21 '17 at 20:15

May i know, By using Multipart Upload can we achieve this? – Anjali Nov 17 '17 at 11:32
4

Multipart Upload will allow you to get the data in to S3 without downloading the original object, but it wouldn't allow you to overwrite the original object directly. See e.g. https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.html You could then delete the old object/rename the new one. This, however, is not what the question is asking. – MikeGM Jan 02 '18 at 19:32
3

I think that using Multipart Upload may actually work. All your parts are sequential segments of the same file. If the part succeeds to be uploaded, you could eventually commit the upload to be able to read the file. So, as long as you don't need to read the contents of the file, you could be appending to using the same multipart upload. – cerebrotecnologico Jun 14 '18 at 22:42
2

@cerebrotecnologico I still don't think it meets the OP's requirements. There is no way I'm aware of to restrict an S3 user to performing multipart uploads which append to an object -- if they can perform a multipart upload, they can upload any content they want. – Jun 15 '18 at 03:13
1

It is possible to provide an "append interface", as [s3fs has done](https://github.com/dask/s3fs/blob/fa1c76a3b75c6d0330ed03c46c2a7c848ebdb62e/s3fs/core.py#L1774-L1786), but only via "no-upload-copy + partial upload + rewrite original", as mentioned by @duskwuff-inactive – Kache May 11 '21 at 17:17

Sridhar Sarnobat · Answer 2 · 2018-04-10T20:59:06.997

31

As the accepted answer states, you can't. The best solution I'm aware of is to use:

AWS Kinesis Firehose

https://aws.amazon.com/kinesis/firehose/

Their code sample looks complicated but yours can be really simple. You keep performing PUT (or BATCH PUT) operations onto a Kinesis Firehose delivery stream in your application (using the AWS SDK), and you configure the Kinesis Firehose delivery stream to send your streamed data to an AWS S3 bucket of your choice (in the AWS Kinesis Firehose console).

It's still not as convenient as >> from the Linux command line, because once you've created a file on S3 you again have to deal with downloading, appending, and uploading the new file but you only have to do it once per batch of lines rather than for every line of data so you don't need to worry about huge charges because of the volume of append operations. Maybe it can be done but I can't see how to do it from the console.

edited Apr 10 '18 at 20:59

answered Aug 19 '17 at 02:20

Sridhar Sarnobat

25,183
12
93
106

11

Note that there is either a max time (900 seconds since file creation) or a max size (128mb file size) on doing this - meaning, Kinesis firehose will append to the same S3 file until it reaches either of those limits: https://docs.aws.amazon.com/firehose/latest/dev/create-configure.html – Yaron Budowski Jan 28 '18 at 18:20
2

Can you use a single S3 file as output on the Firehose? It sounds a bit messy having to merge multiple files in a S3 bucket. – Jón Trausti Arason Nov 05 '19 at 10:45
1

Unfortunately no. I too wish there was a better solution. – Sridhar Sarnobat Nov 05 '19 at 16:42
2

Yeah it's unfortunate. I'm mostly concerned about race condition if I manually download & append records to a single S3 object. I've been thinking about adding the records to SQS and then using some logic with SNS + Lambda to poll the SQS and then write the new entries to the S3 object. – Jón Trausti Arason Nov 06 '19 at 09:36

score 14 · Answer 3 · answered Nov 07 '18 at 16:35

Objects on S3 are not append-able. You have 2 solutions in this case:

copy all S3 data to a new object, append the new content and write back to S3.

function writeToS3(input) {
    var content;
    var getParams = {
        Bucket: 'myBucket', 
        Key: "myKey"
    };

    s3.getObject(getParams, function(err, data) {
        if (err) console.log(err, err.stack);
        else {
            content = new Buffer(data.Body).toString("utf8");
            content = content + '\n' + new Date() + '\t' + input;
            var putParams = {
                Body: content,
                Bucket: 'myBucket', 
                Key: "myKey",
                ACL: "public-read"
             };

            s3.putObject(putParams, function(err, data) {
                if (err) console.log(err, err.stack); // an error occurred
                else     {
                    console.log(data);           // successful response
                }
             });
        }
    });  
}

Second option is to use Kinesis Firehose. This is fairly straightforward. You need to create your firehose delivery stream and link the destination to S3 bucket. That's it!

function writeToS3(input) {
    var content = "\n" + new Date() + "\t" + input;
    var params = {
      DeliveryStreamName: 'myDeliveryStream', /* required */
      Record: { /* required */
        Data: new Buffer(content) || 'STRING_VALUE' /* Strings will be Base-64 encoded on your behalf */ /* required */
      }
    };

    firehose.putRecord(params, function(err, data) {
      if (err) console.log(err, err.stack); // an error occurred
      else     console.log(data);           // successful response
    }); 
}

Can you use a single S3 file as output? – Jón Trausti Arason Nov 05 '19 at 10:45 — Jón Trausti Arason, Nov 05 '19 at 10:45

score 7 · Answer 4 · answered Oct 22 '21 at 15:56

You can:

Set up Multipart Upload
Call UploadPartCopy specifying the existing s3 object as a source
Call UploadPart with the data you want to append
Close Multipart Upload.

There is a number of limitations for example your existing object must be larger then 5MB ( however if it is smaller copying it to the client should be fast enough for most cases) It is not as nice as straight append but at least you do not need to copy data back and forth from aws to the local machine.

score 3 · Answer 5 · answered Dec 13 '19 at 00:27

In case anyone wants to append data to an object with an S3-like service, the Alibaba Cloud OSS (Object Storage Service) supports this natively.

OSS provides append upload (through the AppendObject API), which allows you to directly append content to the end of an object. Objects uploaded by using this method are appendable objects, whereas objects uploaded by using other methods are normal objects. The appended data is instantly readable.

score 3 · Answer 6 · answered Oct 25 '22 at 10:24

The problem we were facing was creating a several gigabyte big s3 file without ever the entirety of it into RAM. The approach below does combine several files by appending them on the end of each other, so depending on your needs, this could be a viable solution.

The solution we came up with was:

Upload the file in chunks into an AWS S3 folder
Use AWS Athena to define a table based on that S3 folder by running

CREATE EXTERNAL TABLE IF NOT EXISTS `TrainingDB`.`TrainingTable` (`Data` string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ('collection.delim' = '\n')
STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://your-bucket-name/TrainingTesting/';

Generate a combination of all the results in that table by running

UNLOAD (SELECT * FROM "TrainingDB"."TrainingTable") 
TO 's3://your-bucket/TrainingResults/results5' 
WITH ( format = 'TEXTFILE', compression='none' )

this will append all the files on the end of each other and provide you with one files with all the chunks you were trying to append. This is overkill if you're just trying to combine a few small files, in which case just pulliing the original file down and writing to the end will probably be better (as the other answers suggest)

score 2 · Answer 7 · answered Jul 25 '19 at 20:20

2

As others have stated previously, S3 objects are not append-able.
However, another solution would be to write out to CloudWatch logs and then export the logs you want to S3. This would also prevent any attackers who access your server from deleting from your S3 bucket, since Lambda wouldn't require any S3 permissions.

answered Jul 25 '19 at 20:20

Leo Glowacki

71
7

1

This is a good solution to the original problem. We shouldn't ask "I can't get Y to solve X, how do I get Y to work?" but rather "How can I solve X?" which I think this does in a better way. – four43 Jul 11 '22 at 19:16

score 1 · Answer 8 · edited Feb 13 '21 at 09:13

S3 bucket does not allow you to append existing objects, the way which can be used to do this, is first use the get method to get the data from S3 bucket then add the new data you want to append in it locally and then push it back to S3 bucket.

As, It is not possible to append to an existing S3 object. You will need to replace it with a new object with the data appended to it. This means that you would need to upload the entire object (log file) each time a new entry is appended to it. This won't be very efficient.

You could have log entries sent to a SQS queue and when the queue size reaches a set number, you could have the log messages batched together and added as an object in your S3 bucket. This still won't satisfy your requirement of appending to a single object

score 1 · Answer 9 · answered Nov 08 '22 at 10:26

I had a similar issue where I had to write errors to a log file in S3 during a long-running proces (couple of hours). So I didn't had a file locally to create a one-time stream, but I had to append the errors to a file on runtime.

So what you can do is keeping an open connection with a specific file and write to the file when you want:

const { S3 } = require('aws-sdk')
const { PassThrough } = require('stream')

// append to open connection
const append = (stream, data ) => new Promise(resolve => {
  stream.write(`${data}\n`, resolve)
})

const openConnectionWithS3 = async () => {
  const s3 = new S3({
    credentials: {
      accessKeyId: process.env.AWS_ACCESS_KEY_ID,
      secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
    },
    endpoint: process.env.AWS_S3_ENDPOINT,
    region: process.env.AWS_DEFAULT_REGION,
  })
  const fileName = 'test.log'
  const bucketName = 'my-bucket'
  // create pass through stream. This stream we use to write data to
  // but this stream we also use to pass the same data to aws
  const pass = new PassThrough()

  // dont resolve the promise, but keep it open and await for the result when the long running process is done
  const promise = s3
    .upload({
      Bucket: bucketName,
      Key: fileName,
      // pass the stream as body, aws will handle the stream from now
      Body: pass,
    })
    .promise()

  // write data to our open connection.
  // we can even write it on different places
  for (let i = 0; i < 100000; i++) {
    await append(pass, `foo${i}`)
  }

  // here we resolve the promise and close the connection
  await Promise.all([
    // push null to the stream, the stream now knows after the
    // 1000 foo's it should stop writing
    pass.push(null),
    promise,
  ])
}

openConnectionWithS3()

It will append items to a file in S3, and resolves when it is done.

The problem in the above approach is that this is not real streaming. Your program first writes everything to stream and only then uploads it to S3. Such that memory has to be as big as the file is and if it fails in the middle it has to start from the beginning. I checked memeory and it keeps growing until the entire stream is filled. — isaac.hazan, Jan 23 '23 at 11:20

score -1 · Answer 10 · answered Sep 07 '17 at 09:27

-1

I had the similar issue and this is what I had asked

how to Append data in file using AWS Lambda

Here's What I come up with to solve the above problem:

Use getObject to retrive from the existing file

   s3.getObject(getParams, function(err, data) {
   if (err) console.log(err, err.stack); // an error occurred
   else{
       console.log(data);           // successful response
       var s3Projects = JSON.parse(data.Body);
       console.log('s3 data==>', s3Projects);
       if(s3Projects.length > 0) {
           projects = s3Projects;
       }   
   }
   projects.push(event);
   writeToS3(); // Calling function to append the data
});

Write function to append in the file

   function writeToS3() {
    var putParams = {
      Body: JSON.stringify(projects),
      Bucket: bucketPath, 
      Key: "projects.json",
      ACL: "public-read"
     };

    s3.putObject(putParams, function(err, data) {
       if (err) console.log(err, err.stack); // an error occurred
       else     console.log(data);           // successful response
        callback(null, 'Hello from Lambda');
     });
}

Hope this help!!

answered Sep 07 '17 at 09:27

Neeraj Kumar

771
2
16
37

15

Your `writeToS3` function will overwrite a file, not append to it. – Sep 20 '17 at 15:56
@duskwuff-inactive- agreed, and also it suffers from race conditions if two methods try to work on the same object, but this is not really different from languages that have immutable strings or types -- you simulate an append by returning/overwriting with a new object. – fatal_error Jun 10 '20 at 23:18
This is useful because it has the advantage of not consuming additional bandwidth if your app that appends data is outside of the AWS network. – ColinM Apr 15 '21 at 23:13
this is not append – user2555515 Mar 18 '22 at 21:06
Like others have said, this is not appending. You're just downloading the entire file, modifying it, and then re-uploading the entire thing with Lambda. And even this probably won't scale due to Lambda's performance constraints. If the file's too large, the Lambda function will time out or run out of memory, which is presumably one of the reasons why Op wants to append to the file in-place. – Cerin Nov 02 '22 at 20:59

Append data to an S3 object

10 Answers10

AWS Kinesis Firehose

Linked