22

I am working on a WebCrawler implementation but am facing a strange memory leak in ASP.NET Web API's HttpClient.

So the cut down version is here:


[UPDATE 2]

I found the problem and it is not HttpClient that is leaking. See my answer.


[UPDATE 1]

I have added dispose with no effect:

    static void Main(string[] args)
    {
        int waiting = 0;
        const int MaxWaiting = 100;
        var httpClient = new HttpClient();
        foreach (var link in File.ReadAllLines("links.txt"))
        {

            while (waiting>=MaxWaiting)
            {
                Thread.Sleep(1000);
                Console.WriteLine("Waiting ...");
            }
            httpClient.GetAsync(link)
                .ContinueWith(t =>
                                  {
                                      try
                                      {
                                          var httpResponseMessage = t.Result;
                                          if (httpResponseMessage.IsSuccessStatusCode)
                                              httpResponseMessage.Content.LoadIntoBufferAsync()
                                                  .ContinueWith(t2=>
                                                                    {
                                                                        if(t2.IsFaulted)
                                                                        {
                                                                            httpResponseMessage.Dispose();
                                                                            Console.ForegroundColor = ConsoleColor.Magenta;
                                                                            Console.WriteLine(t2.Exception);
                                                                        }
                                                                        else
                                                                        {
                                                                            httpResponseMessage.Content.
                                                                                ReadAsStringAsync()
                                                                                .ContinueWith(t3 =>
                                                                                {
                                                                                    Interlocked.Decrement(ref waiting);

                                                                                    try
                                                                                    {
                                                                                        Console.ForegroundColor = ConsoleColor.White;

                                                                                        Console.WriteLine(httpResponseMessage.RequestMessage.RequestUri);
                                                                                        string s =
                                                                                            t3.Result;

                                                                                    }
                                                                                    catch (Exception ex3)
                                                                                    {
                                                                                        Console.ForegroundColor = ConsoleColor.Yellow;

                                                                                        Console.WriteLine(ex3);
                                                                                    }
                                                                                    httpResponseMessage.Dispose();
                                                                                });                                                                                
                                                                        }
                                                                    }
                                                  );
                                      }
                                      catch(Exception e)
                                      {
                                          Interlocked.Decrement(ref waiting);
                                          Console.ForegroundColor = ConsoleColor.Red;                                             
                                          Console.WriteLine(e);
                                      }
                                  }
                );

            Interlocked.Increment(ref waiting);

        }

        Console.Read();
    }

The file containing links is available here.

This results in constant rising of the memory. Memory analysis shows many bytes held possibly by the AsyncCallback. I have done many memory leak analysis before but this one seems to be at the HttpClient level.

Memory profile of the process showing buffers held possibly by async callbacks

I am using C# 4.0 so no async/await here so only TPL 4.0 is used.

The code above works but is not optimised and sometimes throws tantrum yet is enough to reproduce the effect. Point is I cannot find any point that could cause memory to be leaked.

Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • This will start at least one request per second. Maybe you process <1 request per second? – usr Dec 28 '12 at 20:05
  • @usr I just limit the number of requests waiting and check it every second. If I don't this will create 1000s of tasks. – Aliostad Dec 28 '12 at 20:55
  • After checking and waiting you still start another one, though. You always start one. – usr Dec 28 '12 at 21:13
  • I have tried it out and I do see an increase at the beginning. After about 50 exceptions due to not responding servers the memory load does drop from 397MB back to 69MB and stays there. It seems that you have many concurrent requests pending which did not return yet. After the timeout causes the pending tasks to be cancelled all goes back to normal. – Alois Kraus Dec 29 '12 at 13:20
  • @Aliostad This doesn't help, but it may reduce the code a bit...the LoadIntoBufferAsync is redundant unless you use the HttpCompletionOption.ResponseHeadersRead. – Darrel Miller Dec 29 '12 at 15:39
  • If there is a leak, my guess would be that it is in the HttpClientHandler, not in the HttpClient itself. Maybe we could create our own dummy HttpClientHandler and prove that. – Darrel Miller Dec 29 '12 at 15:41
  • I just ran this code. Just looking through task manager I saw memory gradually increase up to around 300MB and then drop to 100MB and then down to 60MB. It appears to me that it just takes a while for the GC to kick in. – Darrel Miller Dec 29 '12 at 15:53
  • @DarrelMiller That could be the finalizers kicking in - although in my case I saw up to 1GB memory usage. Gen 2 limit I think is around 20MB, if I remember correctly - in any case I am sure GC has collected. Profiler actually does not point to HttpClient or handler. – Aliostad Dec 29 '12 at 16:42
  • I was see loads of caught exceptions. Lots of timeouts and connection failures. I set the timeout to infinite and increased the number of allowed connections in ServicePointManager and the exceptions went down significantly and so did the memory usage. – Darrel Miller Dec 29 '12 at 18:20
  • @DarrelMiller code is not optimised as I said. Code for CyberInsekt is a lot better and does not throw exception but keeps growing in memory. – Aliostad Dec 29 '12 at 20:41
  • @Aliostad when do u reach 16577216 bytes? after running all of them? – tugberk Dec 29 '12 at 22:40
  • You cannot allocate some of the resources, because they are still in use by the TCP Stack. They would allocate in time. You may shorten the time by adding header `connection: close`. – nkalfov Feb 11 '19 at 21:39
  • @Aliostad you have a memory leak in the .ContinueWith _else_ case. You're not calling httpResponseMessage.Dispose(). I would rather move that to the outer finally of the try/catch. – Jeff Fischer Dec 04 '22 at 20:03

4 Answers4

22

OK, I got to the bottom of this. Thanks to @Tugberk, @Darrel and @youssef for spending time on this.

Basically the initial problem was I was spawning too many tasks. This started to take its toll so I had to cut back on this and have some state for making sure the number of concurrent tasks are limited. This is basically a big challenge for writing processes that have to use TPL to schedule the tasks. We can control threads in the thread pool but we also need to control the tasks we are creating so no level of async/await will help this.

I managed to reproduce the leak only a couple of times with this code - other times after growing it would just suddenly drop. I know that there was a revamp of GC in 4.5 so perhaps the issue here is that GC did not kick in enough although I have been looking at perf counters on GC generation 0, 1 and 2 collections.

So the take-away here is that re-using HttpClient does NOT cause memory leak.

Aliostad
  • 80,612
  • 21
  • 160
  • 208
  • Thank you for the clarification, I used SemaphoreSlim to throttle the waiting time. http://stackoverflow.com/questions/10806951/how-to-limit-the-amount-of-concurrent-async-i-o-operations – Christopher Bonitz Jan 14 '16 at 13:38
  • just to be clear, what you experienced was not necessarily a memory leak. – Dave Black Jan 11 '23 at 20:32
5

I'm no good at defining memory issues but I gave it a try with the following code. It's in .NET 4.5 and uses async/await feature of C#, too. It seems to keep memory usage around 10 - 15 MB for the entire process (not sure if you see this a better memory usage though). But if you watch # Gen 0 Collections, # Gen 1 Collections and # Gen 2 Collections perf counters, they are pretty high with the below code.

If you remove the GC.Collect calls below, it goes back and forth between 30MB - 50MB for entire process. The interesting part is that when I run your code on my 4 core machine, I don't see abnormal memory usage by the process either. I have .NET 4.5 installed on my machine and if you don't, the problem might be related to CLR internals of .NET 4.0 and I am sure that TPL has improved a lot on .NET 4.5 based on resource usage.

class Program {

    static void Main(string[] args) {

        ServicePointManager.DefaultConnectionLimit = 500;
        CrawlAsync().ContinueWith(task => Console.WriteLine("***DONE!"));
        Console.ReadLine();
    }

    private static async Task CrawlAsync() {

        int numberOfCores = Environment.ProcessorCount;
        List<string> requestUris = File.ReadAllLines(@"C:\Users\Tugberk\Downloads\links.txt").ToList();
        ConcurrentDictionary<int, Tuple<Task, HttpRequestMessage>> tasks = new ConcurrentDictionary<int, Tuple<Task, HttpRequestMessage>>();
        List<HttpRequestMessage> requestsToDispose = new List<HttpRequestMessage>();

        var httpClient = new HttpClient();

        for (int i = 0; i < numberOfCores; i++) {

            string requestUri = requestUris.First();
            var requestMessage = new HttpRequestMessage(HttpMethod.Get, requestUri);
            Task task = MakeCall(httpClient, requestMessage);
            tasks.AddOrUpdate(task.Id, Tuple.Create(task, requestMessage), (index, t) => t);
            requestUris.RemoveAt(0);
        }

        while (tasks.Values.Count > 0) {

            Task task = await Task.WhenAny(tasks.Values.Select(x => x.Item1));

            Tuple<Task, HttpRequestMessage> removedTask;
            tasks.TryRemove(task.Id, out removedTask);
            removedTask.Item1.Dispose();
            removedTask.Item2.Dispose();

            if (requestUris.Count > 0) {

                var requestUri = requestUris.First();
                var requestMessage = new HttpRequestMessage(HttpMethod.Get, requestUri);
                Task newTask = MakeCall(httpClient, requestMessage);
                tasks.AddOrUpdate(newTask.Id, Tuple.Create(newTask, requestMessage), (index, t) => t);
                requestUris.RemoveAt(0);
            }

            GC.Collect(0);
            GC.Collect(1);
            GC.Collect(2);
        }

        httpClient.Dispose();
    }

    private static async Task MakeCall(HttpClient httpClient, HttpRequestMessage requestMessage) {

        Console.WriteLine("**Starting new request for {0}!", requestMessage.RequestUri);
        var response = await httpClient.SendAsync(requestMessage).ConfigureAwait(false);
        Console.WriteLine("**Request is completed for {0}! Status Code: {1}", requestMessage.RequestUri, response.StatusCode);

        using (response) {
            if (response.IsSuccessStatusCode){
                using (response.Content) {

                    Console.WriteLine("**Getting the HTML for {0}!", requestMessage.RequestUri);
                    string html = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
                    Console.WriteLine("**Got the HTML for {0}! Legth: {1}", requestMessage.RequestUri, html.Length);
                }
            }
            else if (response.Content != null) {

                response.Content.Dispose();
            }
        }
    }
}
tugberk
  • 57,477
  • 67
  • 243
  • 335
  • Thanks. collections are high with my code too. I already checked to make sure it is collecting. `GC.Collect()` is bad anyway and should not touch production other than really exceptional cases (e.g. when hosting CLR in another unmanaged app) – Aliostad Dec 30 '12 at 10:36
  • I know. I just put them to prove the point. Simply remove them. When you remove `GC.Collect` calls, memory usage stays between between 30MB - 50MB in my machine. – tugberk Dec 30 '12 at 10:48
  • is there a need to call : response.Content.Dispose()? – Tal Avissar May 11 '17 at 09:42
  • @TalAvissar no, response.Dispose already does that: https://github.com/dotnet/corefx/blob/f92ad2ac0349e1ba6175c6b555d07710e769a8f9/src/System.Net.Http/src/System/Net/Http/HttpResponseMessage.cs#L212 – tugberk May 12 '17 at 12:15
  • @tugberk - response.Dispose is not called in one of his cases. See my comment to the question. – Jeff Fischer Dec 04 '22 at 20:03
2

A recent reported "Memory Leak" in our QA environment taught us this:

Consider the TCP Stack

Don't assume the TCP Stack can do what is asked in the time "thought appropriate for the application". Sure we can spin off Tasks at will and we just love asych, but....

Watch the TCP Stack

Run NETSTAT when you think you have a memory leak. If you see residual sessions or half-baked states, you may want to rethink your design along the lines of HTTPClient reuse and limiting the amount of concurrent work being spun up. You also may need to consider using Load Balancing across multiple machines.

Half-baked sessions show up in NETSTAT with Fin-Waits 1 or 2 and Time-Waits or even RST-WAIT 1 and 2. Even "Established" sessions can be virtually dead just waiting for time-outs to fire.

The Stack and .NET are most likely not broken

Overloading the stack puts the machine to sleep. Recovery takes time and 99% of the time the stack will recover. Remember also that .NET will not release resources before their time and that no user has full control of GC.

If you kill the app and it takes 5 minutes for NETSTAT to settle down, that's a pretty good sign the system is overwhelmed. It's also a good show of how the stack is independent of the application.

Community
  • 1
  • 1
JWP
  • 6,672
  • 3
  • 50
  • 74
0

The default HttpClient leaks when you use it as a short-lived object and create new HttpClients per request.

Here is a reproduction of this behavior.

As a workaround, I was able to keep using HttpClient as a short-lived object by using the following Nuget package instead of the built-in System.Net.Http assembly: https://www.nuget.org/packages/HttpClient

Not sure what the origin of this package is, however, as soon as I referenced it the memory leak disappeared. Make sure that you remove the reference to the built-in .NET System.Net.Http library and use the Nuget package instead.

Community
  • 1
  • 1
Elad Nava
  • 7,746
  • 2
  • 41
  • 61
  • "The default HttpClient leaks when you use it as a short-lived object and create new HttpClients per request." ... even when you were Disposing? – bytedev Feb 25 '16 at 14:06
  • Yes. There seems to be a memory leak within the .NET HttpClient implementation. Also, forcing `GC.Collect()` has no affect. – Elad Nava Feb 25 '16 at 17:30
  • 2
    I've also seen weird memory leaks before in the past, one recommendation was to use the http header connection with a "close". E.g.: client.DefaultRequestHeaders.Add("Connection", "close"); ... maybe worth a try? – bytedev Feb 25 '16 at 17:36
  • Have you tried the Dispose overload to release both managed and unmanaged resources? – Timothy Gonzalez Mar 08 '17 at 15:34
  • Yep, and the memory leak was still present. – Elad Nava Mar 08 '17 at 19:45
  • The package is not available on nuget anymore – Amir Hajiha May 22 '19 at 13:58
  • I agree with @bytedev I feel like once I implemented client.DefaultRequestHeaders.Add("Connection", "close"); made the app run better in terms of memory leaks. However, like I tell all of my clients, "if it's a Microsoft Product you should probably restart the app at least once every 24 hours." – DaWiseguy Sep 02 '19 at 22:25
  • Just to be clear, this is not a "memory leak" with the `HttpClient`. The problem is that you experienced TCP port exhaustion on the operating system which will bring any machine to its knees. `HttpClient` is meant to be re-used across requests and should remain constructed for the lifetime of the app. See my SO answer here for details - https://stackoverflow.com/a/35045301/251267 – Dave Black Jan 11 '23 at 20:36