7

I'm interested in performance (speed, memory usage) comparison of two approaches how to deserialize HTTP response JSON payload using Newtonsoft.Json.

I'm aware of Newtonsoft.Json's Performance Tips to use streams, but I wanted to know more and have hard numbers. I've written simple benchmark using BenchmarkDotNet, but I'm bit puzzled by results (see numbers below).

What I got:

  • parsing from stream is always faster, but not really much
  • parsing small and "medium" JSON has better or equal memory usage when using string as input
  • significant difference in memory usage starts to be seen with large JSON (where string itself ends up in LOH)

I didn't have time to do proper profiling (yet), I'm bit surprised by memory overhead with stream approach (if there's no error). Whole code is here.

?

  • Is my approach correct? (usage of MemoryStream; simulating HttpResponseMessage and its content; ...)
  • Is there any issue with benchmarking code?
  • Why do I see such results?

Benchmark setup

I'm preparing MemoryStream to be used over and over within benchmark run:

[GlobalSetup]
public void GlobalSetup()
{
    var resourceName = _resourceMapping[typeof(T)];
    using (var resourceStream = Assembly.GetExecutingAssembly().GetManifestResourceStream(resourceName))
    {
        _memory = new MemoryStream();
        resourceStream.CopyTo(_memory);
    }

    _iterationRepeats = _repeatMapping[typeof(T)];
}

Stream deserialization

[Benchmark(Description = "Stream d13n")]
public async Task DeserializeStream()
{
    for (var i = 0; i < _iterationRepeats; i++)
    {
        var response = BuildResponse(_memory);

        using (var streamReader = BuildNonClosingStreamReader(await response.Content.ReadAsStreamAsync()))
        using (var jsonReader = new JsonTextReader(streamReader))
        {
            _serializer.Deserialize<T>(jsonReader);
        }
    }
}

String deserialization

We first read JSON from stream to string, and then run deserialization - another string is being allocated, and after that used for deserialization.

[Benchmark(Description = "String d13n")]
public async Task DeserializeString()
{
    for (var i = 0; i < _iterationRepeats; i++)
    {
        var response = BuildResponse(_memory);

        var content = await response.Content.ReadAsStringAsync();
        JsonConvert.DeserializeObject<T>(content);
    }
}

Common methods

private static HttpResponseMessage BuildResponse(Stream stream)
{
    stream.Seek(0, SeekOrigin.Begin);

    var content = new StreamContent(stream);
    content.Headers.ContentType = new MediaTypeHeaderValue("application/json");

    return new HttpResponseMessage(HttpStatusCode.OK)
    {
        Content = content
    };
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static StreamReader BuildNonClosingStreamReader(Stream inputStream) =>
    new StreamReader(
        stream: inputStream,
        encoding: Encoding.UTF8,
        detectEncodingFromByteOrderMarks: true,
        bufferSize: 1024,
        leaveOpen: true);

Results

Small JSON

Repeated 10000 times

  • Stream: mean 25.69 ms, 61.34 MB allocated
  • String: mean 31.22 ms, 36.01 MB allocated

Medium JSON

Repeated 1000 times

  • Stream: mean 24.07 ms, 12 MB allocated
  • String: mean 25.09 ms, 12.85 MB allocated

Large JSON

Repeated 100 times

  • Stream: mean 229.6 ms, 47.54 MB allocated, objects got to Gen 1
  • String: mean 240.8 ms, 92.42 MB allocated, objects got to Gen 2!

Update

I went trough source of JsonConvert and found out that it internally uses JsonTextReader with StringReader when deserializing from string: JsonConvert:816. Stream is involved there as well (of course!).

Then I decided to dig more into StreamReader itself and I was stunned at first sight - it is always allocating array buffer (byte[]): StreamReader:244, which explains its memory use.

This gives me answer to "why". Solution is simple - use smaller buffer size when instantiating StreamReader - minimum buffer size defaults to 128 (see StreamReader.MinBufferSize), but you can supply any value > 0 (check one of ctor overload).

Of course buffer size has effect on processing data. Answering what buffer size I should then use: it depends. When expecting smaller JSON responses, I think it is safe to stick with small buffer.

Zdeněk
  • 929
  • 1
  • 8
  • 25
  • Possibly there is some issue with buffering and `async`, whereby the `_memory` stream is getting copied into another memory stream? http://www.tugberkugurlu.com/archive/efficiently-streaming-large-http-responses-with-httpclient might be relevant. – dbc Jun 05 '19 at 23:02
  • @dbc This is expected in my test (to have filled stream somewhere). I even tried to go trough .NET code - when serializing stream to string, there's some copying happening - but that's against results I'm seeing. (But of course it's nice perf tip!) – Zdeněk Jun 06 '19 at 08:18
  • @dbc I had more time and found out what `StreamReader` does. Updated question (... and suggested possible solution). – Zdeněk Jun 07 '19 at 20:25
  • Interesting, thanks. You could [answer your own question](https://stackoverflow.com/help/self-answer) if you want. – dbc Jun 07 '19 at 20:28
  • https://devblogs.microsoft.com/aspnet/asp-net-core-updates-in-net-core-3-0-preview-5/ new json library – Avin Kavish Jun 08 '19 at 19:46

1 Answers1

4

After some fiddling I found reason behind memory allocation when using StreamReader. Original post is updated, but recap here:

StreamReader uses default bufferSize set to 1024. Every instantiation of StreamReader then allocates byte array of that size. That's the reason why I saw such numbers in my benchmark.

When I set bufferSize to its lowest possible value 128, results seem to be much better.

Zdeněk
  • 929
  • 1
  • 8
  • 25