5

For analytics purposes, I'd like to perform reverse DNS lookups on large batches of IPs. "Large" meaning, at least tens of thousands per hour. I'm looking for ways to increase the processing rate, i.e. lower the processing time per batch.

Wrapping the async version of Dns.GetHostEntry into await-able tasks has already helped a lot (compared to sequential requests), leading to a throughput of appox. 100-200 IPs/second:

static async Task DoReverseDnsLookups()
{
    // in reality, thousands of IPs
    var ips = new[] { "173.194.121.9", "173.252.110.27", "98.138.253.109" }; 
    var hosts = new Dictionary<string, string>();

    var tasks =
        ips.Select(
            ip =>
                Task.Factory.FromAsync(Dns.BeginGetHostEntry,
                    (Func<IAsyncResult, IPHostEntry>) Dns.EndGetHostEntry, 
                    ip, null)
                    .ContinueWith(t => 
                    hosts[ip] = ((t.Exception == null) && (t.Result != null)) 
                               ? t.Result.HostName : null));

    var start = DateTime.UtcNow;
    await Task.WhenAll(tasks);
    var end = DateTime.UtcNow;

    Console.WriteLine("Resolved {0} IPs in {1}, that's {2}/sec.", 
      ips.Count(), end - start, 
      ips.Count() / (end - start).TotalSeconds);
}

Any ideas how to further improve the processing rate?

For instance, is there any way to send a batch of IPs to the DNS server?

Btw, I'm assuming that under the covers, I/O Completion Ports are used by the async methods - correct me if I'm wrong please.

Max
  • 9,220
  • 10
  • 51
  • 83
  • 2
    Why are you using `FromAsync` rather than the existing [Dns.GetHostEntryAsync](http://msdn.microsoft.com/en-us/library/hh194304(v=vs.110).aspx)? – noseratio May 29 '14 at 21:36
  • @Noseratio Because I didn't see it. :/ Probably didn't expect both Begin/End- and -Async versions.. – Max May 30 '14 at 14:41

2 Answers2

5

Hello here are some tips so you can improve:

  1. Cache the queries locally since this information don't usually change for days or even years. This way you don't have to resolve every time.
  2. Most DNS servers will automatically cache the information, so the next time it will resolve pretty fast. Usually the cache is 4 hours, at least it is the default on Windows servers. This means that if you run this process in a batch in a short period, it will perform better that if you resolve the addresses several times during the day allowing cahce to expire.
  3. It is good that you are using Task Parallelism but you are still asking the same DNS servers configured on your machine. I think that having two machines using different DNS servers will improve the process.

I hope this helps.

Baltico
  • 483
  • 3
  • 9
  • Caching definitely helps, actually already doing that (should have mentioned). Multiple DNS servers also a good idea. Thanks! – Max May 30 '14 at 14:43
2
  • As always, I would suggest using TPL Dataflow's ActionBlock instead of firing all requests at once and waiting for all to complete. Using an ActionBlock with a high MaxDegreeOfParallelism lets the TPL decide for itself how many calls to fire concurrently, which can lead to a better utilization of resources:

var block = new ActionBlock<string>(
    async ip => 
    { 
        try
        {
            var host = (await Dns.GetHostEntryAsync(ip)).HostName;
            if (!string.IsNullOrWhitespace(host))
            {
                hosts[ip] = host;
            }
        }
        catch
        {
            return;
        }
    },
    new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5000});
  • I would also suggest adding a cache, and making sure you don't resolve the same ip more than once.

  • When you use .net's Dns class it includes some fallbacks beside DNS (e.g LLMNR), which makes it very slow. If all you need are DNS queries you might want to use a dedicated library like ARSoft.Tools.Net.


P.S: Some remarks about your code sample:

  1. You should be using GetHostEntryAsync instead of FromAsync
  2. The continuation can potentially run on different threads so you should really be using ConcurrentDictionary.
i3arnon
  • 113,022
  • 33
  • 324
  • 344
  • 1
    Tried out ARSoft.Tools.Net, it's *a lot* faster than System.Net.Dns - at least 5x. – Max May 30 '14 at 19:10