0

I'm trying to store large objects as gzipped JSON text to an Azure blob.

I don't want to hold the serialized data in memory, and I don't want to spool to disk if I can avoid it, but I don't see how to just let it serialize and compress on the fly.

I'm using JSON.NET from Newtonsoft (pretty much the de facto standard JSON serializer for .NET), but the signatures of the methods don't really seem to support on-the-fly streaming.

Microsoft.WindowsAzure.Storage.Blob.CloudBlockBlob has an UploadFromStream(Stream source, AccessCondition accessCondition = null, BlobRequestOptions options = null, OperationContext operationContext = null) method, but in order for that to work properly, I need to have the position of the stream be 0, and the JsonSerializer.SerializeObject doesn't do that. It just acts on a stream, and when it's done the stream position is at EOF.

What I'd like to do is something like this:

    public void SaveObject(object obj, string path, JsonSerializerSettings settings = null)
    {
        using (var jsonStream = new JsonStream(object, settings ?? _defaultSerializerSettings))
        using (var gzipStream = new GZipStream(jsonStream))
        {
            var blob = GetCloudBlockBlob(path);
            blob.UploadFromStream(gzipStream);
        }
    }

...the idea being, serialization does not start until something is pulling data (in this case, the GZipStream, which does not compress data until pulled by the blob.UploadFromStream() method) thus it maintains a low overhead. It does not need to be a seekable stream, it just needs to be read on demand.

I trust everyone can see how this would work if you were doing a stream from System.IO.File.OpenRead() instead of new JsonStream(object obj). While it gets a bit more complicated because Json.NET needs to "look ahead" and potentially fill a buffer, they got it working with the CryptoStream and GZipStream and that works real slick.

Is there a way to do this that does not load the entire JSON representation of the object into memory, or spool it to disk first just to regurgitate? If CryptoStreams can do it, we should be able to do it with Json.NET without a large amount of effort. I would think.

Jeremy Holovacs
  • 22,480
  • 33
  • 117
  • 254
  • Sounds like you want something like a producer/consumer pattern mediated by a `Stream`. Maybe see [Implementing async stream for producer/cosumer in C# / .NET](https://stackoverflow.com/q/3721552), [Producer/Consumer example using Stream.Synchronized - Consumer isn't getting any data](https://stackoverflow.com/a/45313006), https://github.com/StephenCleary/ProducerConsumerStream, [How to pipe what is written to Stream 1 into Stream 2?](https://stackoverflow.com/q/33523363), ... – dbc Sep 09 '19 at 18:50
  • To compress on the fly while serializing see [Can I decompress and deserialize a file using streams?](https://stackoverflow.com/a/32944462/3744182). – dbc Sep 09 '19 at 18:52
  • @dbc that's similar to the idea but please note that would not work if you need to pass that stream to a different method for transport like in the above example. – Jeremy Holovacs Sep 09 '19 at 19:00

0 Answers0