It does sound like you're swamping your local DNS server (in the jargon, your local recursive DNS resolver).
When your program issues a DNS resolution request, it sends a port 53 datagram to the local resolver. That resolver responds either by replying from its cache or recursively resending the request to some other resolver that's been identified as possibly having the record you're looking for.
So, your multithreaded program is causing a lot of datagrams to fly around. Internet Protocol hosts and routers handle congestion and overload by dropping datagram packets. It's like handling a traffic jam on a bridge by bulldozing cars off the bridge. In an overload situation, some packets just disappear.
So, it's up to endpoint software using datagram protocols to try again if their packets get lost. That's the purpose of TCP, and that's how it can provide the illusion of an error-free stream of data even though it can only communicate with datagrams.
So, your program will need to try again when you get resolution failure on some of your DNS requests. You're a datagram endpoint so you own the responsibility of retry. I suspect the .net library is give you back failure when some of your requests time out because your datagrams got dropped.
Now, here's the important thing. It is also the responsibility of a datagram endpoint program, like yours, to implement congestion control. TCP does this automatically using its sliding window system, with an algorithm called slow-start / exponential backoff. If TCP didn't do this all internet routers would be congested all the time. This algorithm was dreamed up by Van Jacobson, and you should go read about it.
In the meantime you should implement a simple form of it in your bulk DNS lookup program. Here's how you might do that.
- Start with a batch size of, say, 5 lookups.
- Every time you get the whole batch back successfully, increase your batch size by one for your next batch. This is slow-start. As long as you're not getting congestion, you increase the network load.
- Every time you get a failure to resolve a name, reduce the size of the next batch by half. So, for example, if your batch size was 30 and you got a failure, your next batch size will be 15. This is exponential backoff. You respond to congestion by dramatically reducing the load you're putting on the network.
- Implement a maximum batch size of something like 100 just to avoid being too much of a pig and looking like a crude denial-of-service attack to the DNS system.
I had a similar project a while ago and this strategy worked well for me.