1

I have an object that has to be converted to Json format and uploaded via Stream object. This is the AWS S3 upload code:

        AWSS3Client.PutObjectAsync(new PutObjectRequest()
        {
            InputStream = stream,
            BucketName = name,
            Key = keyName
        }).Wait();

Here stream is Stream type which is read by AWSS3Client. The data that I am uploading is a complex object that has to be in Json format.

I can convert object to string using JsonConvert.SerializeObject or serialize to file using JsonSerializer but since amount of data is quite significant I would prefer to avoid temporary string or file and convert object to readable Stream right away. My ideal code would look something like this:

        AWSS3Client.PutObjectAsync(new PutObjectRequest()
        {
            InputStream = MagicJsonConverter.ToStream(myDataObject),
            BucketName = name,
            Key = keyName
        }).Wait();

Is there a way to achieve this using Newtonsoft.Json ?

Optional Option
  • 1,521
  • 13
  • 33

2 Answers2

1

You need two things here: one is producer/consumer stream, e.g. BlockingStream from this StackOverflow question, and second, Json.Net serializer writing to this stream like in this another SO question.

archnae
  • 381
  • 2
  • 5
  • 1
    This approach would require BlockingStream on a separate thread. My question does not look easy at all. – Optional Option Nov 03 '17 at 19:46
  • I think your best choice is using a temp file. It has one big advantage - if Json serialization fails mid-way for whatever reason, you won't be left with half-baked unfinished object in the AWS store. Implementing proper exception handling on top of the custom producer/consumer stream will make 10 lines of code into 150 without improving performance much - temp files are not that slow anyway as long as they are local. – archnae Nov 03 '17 at 22:15
  • 1
    You should add this comment to your answer. Basically my magic serialization stream does not exist and the best way is either using temp file or serializing into memory stream. – Optional Option Nov 05 '17 at 17:54
  • I'd say try the simplest way - temp file, look at performance and reevaluate.I can imagine situations when magic serialization stream can be a necessity. I've implemented something like that with TPL Dataflow lately to save some 100s of seconds at cost of 1000s lines of code - error handling, back-pressure, buffering and all, but I had a pressing need for that - organization-wide enforced SQL query timeout, their way or no way. I hope your need is not as pressing as that :) – archnae Nov 05 '17 at 18:33
1

Another practical option is to wrap the memory stream with gzip stream (2 lines of code).
Usually, JSON files will have great compression (1GB file can be compressed to 50MB).
Then when serving the stream to S3, wrap it with gzip stream which decompresses it.
I guess the trade-off comparing to temp file is CPU vs IO (both will probably work well). If you can save it compressed on S3 it will save you space and increase networking efficiency too.
Example code:

var compressed = new MemoryStream();
using (var zip = new GZipStream(compressed, CompressionLevel.Fastest, true))
{
    -> Write to zip stream...
}
compressed.Seek(0, SeekOrigin.Begin);
-> Use stream to upload to S3
Avner Levy
  • 6,601
  • 9
  • 53
  • 92