22

I am working on a "heartbeat" application that pings hundreds of IP addresses every minute via a loop. The IP addresses are stored in a list of a class Machines. I have a loop that creates a Task<MachinePingResults> (where MachinePingResults is basically a Tuple of an IP and online status) for each IP and calls a ping function using System.Net.NetworkInformation.

The issue I'm having is that after hours (or days) of running, one of the loops of the main program fails to finish the Tasks which is leading to a memory leak. I cannot determine why my Tasks are not finishing (if I look in the Task list during runtime after a few days of running, there are hundreds of tasks that appear as "awaiting"). Most of the time all the tasks finish and are disposed; it is just randomly that they don't finish. For example, the past 24 hours had one issue at about 12 hours in with 148 awaiting tasks that never finished. Due to the nature of not being able to see why the Ping is hanging (since it's internal to .NET), I haven't been able to replicate the issue to debug.

(It appears that the Ping call in .NET can hang and the built-in timeout fail if there is a DNS issue, which is why I built an additional timeout in)

I have a way to cancel the main loop if the pings don't return within 15 seconds using Task.Delay and a CancellationToken. Then in each Ping function I have a Delay in case the Ping call itself hangs that forces the function to complete. Also note I am only pinging IPv4; there is no IPv6 or URL.

Main Loop

pingcancel = new CancellationTokenSource();

List<Task<MachinePingResults>> results = new List<Task<MachinePingResults>>();

try
{
    foreach (var m in localMachines.FindAll(m => !m.Online))
        results.Add(Task.Run(() =>
            PingMachine(m.ipAddress, 8000), pingcancel.Token
        ));

    await Task.WhenAny(Task.WhenAll(results.ToArray()), Task.Delay(15000));

    pingcancel.Cancel();
}
catch (Exception ex) { Console.WriteLine(ex); }
finally
{
    results.Where(r => r.IsCompleted).ToList()
        .ForEach(r =>
        //modify the online machines);
        results.Where(r => r.IsCompleted).ToList().ForEach(r => r.Dispose());
        results.Clear();
 }

The Ping Function

static async Task<MachinePingResults> PingMachine(string ipaddress, int timeout)
{
    try
    {
        using (Ping ping = new Ping())
        {
            var reply = ping.SendPingAsync(ipaddress, timeout);

            await Task.WhenAny(Task.Delay(timeout), reply);

            if (reply.IsCompleted && reply.Result.Status == IPStatus.Success)
            {
                return new MachinePingResults(ipaddress, true);
            }
        }
    }
    catch (Exception ex)
    {
        Debug.WriteLine("Error: " + ex.Message);
    }
    return new MachinePingResults(ipaddress, false);
}

With every Task having a Delay to let it continue if the Ping hangs, I don't know what would be the issue that is causing some of the Task<MachinePingResults> to never finish.

How can I ensure a Task using .NET Ping ends?

Using .NET 5.0 and the issues occurs on machines running windows 10 and windows server 2012

image of list of tasks awaiting in VS Tasks window

image of list of tasks awaiting in VS Tasks window

davidsbro
  • 2,761
  • 4
  • 23
  • 33
  • I'm not familiar with ``Ping`` but it might hang in the synchronous part, try putting the call to ``SendPingAsync`` in a Task.Run. (``var reply = Task.Run(async () => await ping.SendPingAsync(ipaddress, timeout));``). This won't fix the memory leak but will at least allow ``PingMachine`` to always return. – Raphtaliyah Nov 24 '21 at 14:19
  • *"if I look in the Task list during runtime after a few days of running, there are hundreds of tasks that appear as awaiting"*: What is the Task list? Are you running the program for days with the debugger attached? – Theodor Zoulias Nov 24 '21 at 18:51
  • @TheodorZoulias yes, I am running with the debugger attached. The Task List I'm referring to is the one in Visual Studio. When I was running without the debugger I was having a memory leak, so I left the debugger attached for a few days. There are multiple `Tasks` that are never executed or disposed of – davidsbro Nov 24 '21 at 18:53
  • 1
    Have you tried to associate the hanging tasks with the IP addresses? Are there any specific IP addresses that cause the tasks to hang, or it's completely random? – Theodor Zoulias Nov 24 '21 at 19:38
  • 1
    @TheodorZoulias it is random; sometimes it is all 1391 that I'm pinging, sometimes it's a few hundred. Since it will hang on all of them, I'm assuming it's not related to the IP itself. I'm thinking if the internal .NET `Ping` throws an exception, it hangs the function and task. – davidsbro Nov 24 '21 at 19:39
  • 3
    You have a tough problem to solve. What I would try in your case is to limit the amount of concurrent `SendPingAsync` operations, by using the [`Parallel.ForEachAsync`](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreachasync) or something. I don't expect this to solve the problem though. – Theodor Zoulias Nov 24 '21 at 19:51
  • 2
    `Task.WhenAny` is already an attempt to fix it? Given that ping has its own timeout, much of the PingMachine method seems redundant.. – GolezTrol Nov 24 '21 at 22:40
  • @GolezTrol the ping method in .NET can hang indefinitely. See the linked SO question. – davidsbro Nov 25 '21 at 00:29
  • 13
    This question is being discussed on [meta](https://meta.stackoverflow.com/questions/413266/why-did-my-question-get-closed-i-believe-its-high-quality-but-it-was-closed-do) – justANewb stands with Ukraine Nov 25 '21 at 01:59
  • 2
    IMO it appears the possible hang is DNS, because `Ping.SendAsync` use the synchronous `Dns.GetAddresses` even though it should use the async. So it might be worth splitting those two up yourself and calling `Dns.GetHostAddressesAsync` manually, and passing `Ping` just the IP addresses. You can see this bug in [referencesource for Framework 4](https://referencesource.microsoft.com/#system/net/System/Net/NetworkInformation/ping.cs,608) and the fix in [Runtime 5.0+](https://github.com/dotnet/runtime/commit/74313624e103654d88ac105f35d1fe262c8dbc0c) – Charlieface Nov 25 '21 at 02:16
  • 1
    @charlieface the IPs I’m passing are all numeric (xxx.xxx.xxx.xxx) . There’s no URLs – davidsbro Nov 25 '21 at 02:32
  • 1
    it goes over DNS if you use string. use IPAddress class as parameter for SendPingAsync – Juraj Nov 25 '21 at 06:37
  • @Juraj are you sure? The [source code](https://referencesource.microsoft.com/system/net/System/Net/NetworkInformation/ping.cs.html#dad224c2e21d4b1b) suggest otherwise. I see an `if (!IPAddress.TryParse(hostNameOrAddress...` there. – Theodor Zoulias Nov 25 '21 at 09:00
  • While not an answer I have hit something similar where I at least came up with a way to narrow the problem: The first thing a task does is put itself on a list, the last thing it does is take itself back off. There's also a status field that gets updated with what's going on. Examining the list shows where the task is hanging, but not why. (Obviously, lots of care must be taken with locks.) – Loren Pechtel Nov 27 '21 at 00:46
  • 1
    If all attempts to find the root cause of the problem fail, you could admit defeat and offload the heartbeat detection to a separate process, that could be recycled every one hour or so. This would require to introduce inter-process communication machinery in your main app, which is always an undesirable thing to do. – Theodor Zoulias Nov 27 '21 at 01:41
  • @LorenPechtel the tasks hang at either awaiting or unwrapping, on the call to the System Ping. Since it seems to be a .NET problem, that's why I am just trying to create a way to kill those tasks cleanly. – davidsbro Nov 29 '21 at 13:20

1 Answers1

7

There are quite a few gaps in the code posted, but I attempted to replicate and in doing so ended up refactoring a bit.

This version seems pretty robust, with the actual call to SendAsync wrapped in an adapter class.

I accept this doesn't necessarily answer the question directly, but in the absence of being able to replicate your problem exactly, offers an alternative way of structuring the code that may eliminate the problem.

    async Task Main()
    {
        var masterCts = new CancellationTokenSource(TimeSpan.FromSeconds(15)); // 15s overall timeout
        
        var localMachines = new List<LocalMachine>
        {       
            new LocalMachine("192.0.0.1", false), // Should be not known - TimedOut
            new LocalMachine("192.168.86.88", false), // Should be not known - DestinationHostUnreachable (when timeout is 8000)
            new LocalMachine("www.dfdfsdfdfdsgrdf.cdcc", false), // Should be not known - status Unknown because of PingException
            new LocalMachine("192.168.86.87", false) // Known - my local IP
        };

        var results = new List<PingerResult>();

        try
        {
            // Create the "hot" tasks
            var tasks = localMachines.Where(m => !m.Online)
                                    .Select(m => new Pinger().SendPingAsync(m.HostOrAddress, 8000, masterCts.Token))
                                    .ToArray();

            await Task.WhenAll(tasks);
            
            results.AddRange(tasks.Select(t => t.Result));
        }
        finally
        {
            results.ForEach(r => localMachines.Single(m => m.HostOrAddress.Equals(r.HostOrAddress)).Online = r.Status == IPStatus.Success);

            results.Dump();  // For LINQPad
            localMachines.Dump(); // For LINQPad

            results.Clear();
        }
    }

    public class LocalMachine
    {
        public LocalMachine(string hostOrAddress, bool online)
        {
            HostOrAddress = hostOrAddress;
            Online = online;
        }

        public string HostOrAddress { get; }

        public bool Online { get; set; }
    }

    public class PingerResult
    {
        public string HostOrAddress {get;set;}
        
        public IPStatus Status {get;set;}
    }

    public class Pinger 
    {
        public async Task<PingerResult> SendPingAsync(string hostOrAddress, int timeout, CancellationToken token)
        {
            // Check if timeout has occurred
            token.ThrowIfCancellationRequested();

            IPStatus status = default;

            try
            {
                var reply = await SendPingInternal(hostOrAddress, timeout, token);
                status = reply.Status;
            }
            catch (PingException)
            {               
                status = IPStatus.Unknown;
            }
            
            return new PingerResult
            {
                HostOrAddress = hostOrAddress,
                Status = status
            };
        }

        // Wrap the legacy EAP pattern offered by Ping.
        private Task<PingReply> SendPingInternal(string hostOrAddress, int timeout, CancellationToken cancelToken)
        {
            var tcs = new TaskCompletionSource<PingReply>();

            if (cancelToken.IsCancellationRequested)
            {
                tcs.TrySetCanceled();
            }   
            else
            {
                using (var ping = new Ping())
                {
                    ping.PingCompleted += (object sender, PingCompletedEventArgs e) =>
                    {
                        if (!cancelToken.IsCancellationRequested)
                        {
                            if (e.Cancelled)
                            {
                                tcs.TrySetCanceled();
                            }
                            else if (e.Error != null)
                            {
                                tcs.TrySetException(e.Error);
                            }
                            else
                            {
                                tcs.TrySetResult(e.Reply);
                            }
                        }
                    };
                    
                    cancelToken.Register(() => { tcs.TrySetCanceled(); });

                    ping.SendAsync(hostOrAddress, timeout, new object());
                }
            };

            return tcs.Task;
        }
    }

EDIT:

I've just noticed in the comments that you mention pinging "all 1391". At this point I would look to throttle the number of pings that are sent concurrently using a SemaphoreSlim. See this blog post (from a long time ago!) that outlines the approach: https://devblogs.microsoft.com/pfxteam/implementing-a-simple-foreachasync/

Paul Suart
  • 6,505
  • 7
  • 44
  • 65