1

I plan to read a remote file line by line asynchronously using https://github.com/Dasync/AsyncEnumerable (since there is not yet Async Streams [C# 8 maybe]: https://github.com/dotnet/csharplang/blob/master/proposals/async-streams.md):

public static class StringExtensions
{
    public static AsyncEnumerable<string> ReadLinesAsyncViaHttpClient(this string uri)
    {
        return new AsyncEnumerable<string>(async yield =>
        {
            using (var httpClient = new HttpClient())
            {
                using (var responseStream = await httpClient.GetStreamAsync(uri))
                {
                    using (var streamReader = new StreamReader(responseStream))
                    {
                        while(true)
                        {
                            var line = await streamReader.ReadLineAsync();

                            if (line != null)
                            {
                                await yield.ReturnAsync(line);
                            }
                            else
                            {
                                return;
                            }
                        } 
                    }
                }
            }
        });
    }
    public static AsyncEnumerable<string> ReadLinesAsyncViaWebRequest(this string uri)
    {
        return new AsyncEnumerable<string>(async yield =>
        {
            var request = WebRequest.Create(uri);
            using (var response = request.GetResponse())
            {
                using (var responseStream = response.GetResponseStream())
                {
                    using (var streamReader = new StreamReader(responseStream))
                    {
                        while(true)
                        {
                            var line = await streamReader.ReadLineAsync();

                            if (line != null)
                            {
                                await yield.ReturnAsync(line);
                            }
                            else
                            {
                                return;
                            }
                        } 
                    }
                }
            }
        });
    }
}

It seems that they both run just fine in a simple Console application like below:

public class Program
{
    public static async Task Main(string[] args)
    {
        // Or any other remote file
        const string url = @"https://gist.githubusercontent.com/dgrtwo/a30d99baa9b7bfc9f2440b355ddd1f75/raw/700ab5bb0b5f8f5a14377f5103dbe921d4238216/by_tag_year.csv";

        await url.ReadLinesAsyncViaWebRequest().ForEachAsync(line =>
        {
            Console.WriteLine(line, Color.GreenYellow);
        });
        await url.ReadLinesAsyncViaHttpClient().ForEachAsync(line =>
        {
            Console.WriteLine(line, Color.Purple);
        });
    }
}

... but I have some concerns if it is used as part of an ASP.NET Core WebAPI to process the lines and then push them using PushStreamContent:

The idea would be to have a pipeline of data which leverages async / await so that the number of threads in use is as low as possible and also to avoid an increase in memory (which leverage the enumerable-like feature of AsyncEnumerable).

I read several articles but it seems it's all non .NET Core versions and I don't really know if there would be some potential performance issues / caveats in regard to what I would like to achieve?

An example of "business" case would be:

using System;
using System.Collections.Async;
using System.IO;
using System.Threading.Tasks;
using Microsoft.AspNetCore.Mvc;

namespace WebApplicationTest.Controllers
{
    [Route("api/[controller]")]
    [ApiController]
    public class DumbValuesController : ControllerBase
    {
        private static readonly Random Random = new Random();

        // GET api/values
        [HttpGet]
        public async Task<IActionResult> DumbGetAsync([FromQuery] string fileUri)
        {
            using (var streamWriter = new StreamWriter(HttpContext.Response.Body))
            {
                await fileUri.ReadLinesAsyncViaHttpClient().ForEachAsync(async line =>
                {
                    // Some dumb process on each (maybe big line)
                    line += Random.Next(0, 100 + 1);
                    await streamWriter.WriteLineAsync(line);
                });
            }

            return Ok();
        }
    }
}
Natalie Perret
  • 8,013
  • 12
  • 66
  • 129
  • 1
    Always use `HttpClient`. It is the standard. Everything else is just for backwards compatibility. – Chris Pratt Nov 05 '18 at 15:07
  • @ChrisPratt I was thinking maybe the kind of stream (the underlying implementation) might differ between the two. – Natalie Perret Nov 05 '18 at 15:09
  • 2
    @EhouarnPerret you are confusing streams with enumerables. I suspect some Java roots. Streams have to do with IO, not enumeration and they *are* async, since .NET 1.0 back in 2002. HttpWebRequest is also async but HttpClient is better because it doesn't have to perform DNS resolution and HTTPS handshake for every single call. In .NET Core 2+ it uses a newer, faster Sockets implementation too. Combined with HttpClientFactory and Polly, it provides HTTP connection pooling, retry strategies and more – Panagiotis Kanavos Nov 05 '18 at 15:27
  • @EhouarnPerret what are you trying to do anyway? Unless the HTTP request returns a text file, you can't read it line-by-line. If you want to publish results as soon as they arrive, you need a pub/sub mechanism, like the one provided by System.Threading.Channels. If you want to process raw data, it's System.IO.Pipelines, which adds memory management and minimal allocations on top – Panagiotis Kanavos Nov 05 '18 at 15:30
  • @PanagiotisKanavos no Java roots whatsoever, but I really like the details your bring to the table tho, really appreciated. Didn't know Polly :) I will edit my question and add my business use case. Side note if you look at the StreamReader it leverage the underlying stream and read chunk of data (buffer) until it reaches an eol delimiter. – Natalie Perret Nov 05 '18 at 15:31
  • @PanagiotisKanavos https://github.com/dotnet/corefx/blob/a10890f4ffe0fadf090c922578ba0e606ebdd16c/src/Common/src/CoreLib/System/IO/StreamReader.cs#L886 – Natalie Perret Nov 05 '18 at 15:40
  • @PanagiotisKanavos I updated my post to include a simple business case. – Natalie Perret Nov 05 '18 at 17:02

2 Answers2

4

We have access to the source code for .NET Core. So you can look.

The underlying implementation of both end up using HttpClientHandler (the implementation of that class is split up into 4 files).

You can see this from the source code of both HttpClient and HttpWebRequest (which WebRequest uses).

So I suspect you won't notice any difference in the performance of either.

HttpClient is the latest one to be written, so that's why its use is encouraged. And for the reasons mentioned in the article you linked to: http://www.diogonunes.com/blog/webclient-vs-httpclient-vs-httpwebrequest/

Gabriel Luci
  • 38,328
  • 4
  • 55
  • 84
1

With the latest release of .Net Core 6.0, WebRequest will be declared as deprecated. Microsoft recommended to use HttpClient instead

https://learn.microsoft.com/en-us/dotnet/core/compatibility/networking/6.0/webrequest-deprecated

VyTre
  • 103
  • 3
  • 14