1

Environment: Windows Server 2012 R2 64-bit. C# .NET Framework version 4.5.1.

I am trying to use this program to download a bunch of files from a SharePoint 2013 site: https://github.com/nddipiazza/Sharepoint-Exporter

In this project there is a [FileDownloader.cs][1] file that pulls requests to download files from a BlockingCollection and downloads them to file.

When I run this I pretty quickly get hit with socket errors:

System.AggregateException: One or more errors occurred. ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted 10.5.50.2:443
   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
   at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification)
   at SpPrefetchIndexBuilder.FileDownloader.attemptToDownload(FileToDownload toDownload, Int32 numRetry)
---> (Inner Exception #0) System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: Only one usage of each socket address (protocol/network address/port) is normally permitted 10.5.50.2:443
   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)
   --- End of inner exception stack trace ---
   at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult)
   at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar)
   --- End of inner exception stack trace ---<---

Basically it seems like I am hitting this code too hard:

public void AttemptToDownload(FileToDownload toDownload, int numRetry)
{
    try
    {
        var responseResult = client.GetAsync(SpPrefetchIndexBuilder.topParentSite + toDownload.serverRelativeUrl);
        if (responseResult.Result != null && responseResult.Result.StatusCode == System.Net.HttpStatusCode.OK)
        {
            using (var memStream = responseResult.Result.Content.ReadAsStreamAsync().GetAwaiter().GetResult())
            {
                using (var fileStream = File.Create(toDownload.saveToPath))
                {
                    memStream.CopyTo(fileStream);
                }
            }
            Console.WriteLine("Thread {0} - Successfully downloaded {1} to {2}", Thread.CurrentThread.ManagedThreadId, toDownload.serverRelativeUrl, toDownload.saveToPath);
        }
        else
        {
            Console.WriteLine("Got non-OK status {0} when trying to download url {1}", responseResult.Result.StatusCode, SpPrefetchIndexBuilder.topParentSite + toDownload.serverRelativeUrl);
        }
    }
    catch (Exception e)
    {
        if (numRetry >= NUM_RETRIES)
        {
            Console.WriteLine("Gave up trying to download url {0} to file {1} after {2} retries due to error: {3}", SpPrefetchIndexBuilder.topParentSite + toDownload.serverRelativeUrl, toDownload.saveToPath, NUM_RETRIES, e);
        }
        else
        {
            AttemptToDownload(toDownload, numRetry + 1);
        }
    }
}

Is there something wrong with how I'm using HttpClient? I read all the forums and it says to re-use a static reference, and I am doing that. All of my threads share the same static HttpClient reference.

Here is a netstat -a -o -n from when the downloads have been running for a few minutes. There are a lot of TIMED_WAIT sitting there. https://pastebin.com/GTYmqwue In this test i was using SharePoint on port 80. Why is that?

When I run it again a couple minutes later, the number of TIMED_WAIT have increased hundreds more. There must be a leak of some sort going on?

Why is HttpClient leaving 1000's of TIMED_WAIT connections sitting there? How can I get them to close?

I tried to set ServiePointManager.DefaultConnectionLimit = numThreads but it still grows to 1000's of connections.

Nicholas DiPiazza
  • 10,029
  • 11
  • 83
  • 152
  • What operating system? – Erik Philips Dec 06 '17 at 20:00
  • Seems like a duplicate - https://stackoverflow.com/questions/2960056/trying-to-run-multiple-http-requests-in-parallel-but-being-limited-by-windows – Erik Philips Dec 06 '17 at 20:01
  • Even if you're using a single HttpClient there's only so many simultaneous requests you can process at once. If you're using async in a loop it's not hard to easily fire off 1000s of such requests. – Dylan Nicholson Dec 06 '17 at 20:04
  • @ErikPhilips i updated the top of the ticket with that info. And https://stackoverflow.com/questions/2960056/trying-to-run-multiple-http-requests-in-parallel-but-being-limited-by-windows sounds different. that guy was just getting 2 tcp/ip connections active. I'm running out of sockets completely. seems different. – Nicholas DiPiazza Dec 06 '17 at 20:04
  • @DylanNicholson any chance you could answer the question with a way to maximize the amount my server can handle without having these socket timeouts? is there some way to make sure my code doesn't hit that maximum? – Nicholas DiPiazza Dec 06 '17 at 20:05
  • @DylanNicholson code by the link is completely sequential, downloads files one by one. – Evk Dec 06 '17 at 20:07
  • @DylanNicholson the code is actually in a blocking collection that is fetched in a `Parallel` method. It will have `N` threads downloading it. see https://github.com/nddipiazza/Sharepoint-Exporter/blob/master/SpCrawler/SpPrefetchIndexBuilder.cs#L255 – Nicholas DiPiazza Dec 06 '17 at 20:08
  • 2
    You can limit the number of async operations running simultaneously. Check out the accepted answer of this question: https://stackoverflow.com/questions/9290498/how-can-i-limit-parallel-foreach – Arash Motamedi Dec 06 '17 at 20:12
  • 2
    What values of N have you tried? To get the error you're seeing it must be extremely high, but I'd be surprised if making it higher than 10 would help much, depending on what sort of hardware you're running on and what the server is capable of. @Evk I was assuming the included code was being called from elsewhere that did parallel invocations. – Dylan Nicholson Dec 06 '17 at 20:12
  • I have it pretty huge but it's working great on my amazon AWS test instance. but i get socket timeouts like crazy on my company's staging server. perhaps that environment has a lower allowed concurrent sockets setting. – Nicholas DiPiazza Dec 06 '17 at 20:13
  • @ArashMotamedi thanks for the response. I am doing it like this `Parallel.For(0, numThreads, x => spib.DownloadFilesFromQueue());` which would probably have the same effect yes? Or am I wrong about that? – Nicholas DiPiazza Dec 06 '17 at 20:16
  • 2
    Port 443 is for ssl/tls which is secure. Often firewalls and virus checker block port number under 1000. So I would check your firewall and virus checker to make sure port 443 is not blocked. – jdweng Dec 06 '17 at 20:18
  • Ahhh. That might very well be @jdweng. i will check that. – Nicholas DiPiazza Dec 06 '17 at 20:18
  • Define "pretty huge"? If it's less than a several 1000 then there's something else happening, and if you're getting timeouts like you describe it may well be that the problem is you're ending up with a lot of sockets in TIME_WAIT status. What does netstat show? – Dylan Nicholson Dec 06 '17 at 20:22
  • @DylanNicholson I've only tried 30 - 90 and i get the problem consistently. I'm not getting timeouts that I know of?... just the Socket bind error saying I can't re-use a socket. I can look into netstat. – Nicholas DiPiazza Dec 06 '17 at 20:25
  • 1
    (Your link above suggests it's only 50, unless you've set the SP_NUM_THREADS environment variable. You definitely shouldn't run out of sockets trying to perform 50 simultaneous HttpClient requests at once. – Dylan Nicholson Dec 06 '17 at 20:25
  • let me run some netstat to get some idea of how many connections we are dealing with here. – Nicholas DiPiazza Dec 06 '17 at 20:26
  • 1
    @NicholasDiPiazza Nope, `Parallel.For` is different. I think you should follow the example in the answer I referenced. Use `Parallel.ForEach` and make sure to pass in a `new ParallelOptions { MaxDegreeOfParallelism = 4 }` so that it can limit the number of concurrently running operations. – Arash Motamedi Dec 06 '17 at 20:47
  • Will do @ArashMotamedi !! thanks!! – Nicholas DiPiazza Dec 06 '17 at 20:57
  • @DylanNicholson there are a ton of timed wait in the netstat. see https://pastebin.com/GTYmqwue Note: i had to do my test against a sharepoint on port 80 at 172.16.11.15 that doesn't exhibit the problem. I don't have access to the production instance right now. – Nicholas DiPiazza Dec 06 '17 at 22:07
  • @DylanNicholson the number of timed wait connections goes up and up as the program proceeds. There seems like there is a leak here – Nicholas DiPiazza Dec 06 '17 at 22:45

1 Answers1

2

I think I figured it out finally

I accidentally had a really aggressive client.Timeout = TimeSpan.FromMinutes(5);

I changed this to client.Timeout = TimeSpan.FromSeconds(15);

Now I think the connections aren't leaking anymore.

Nicholas DiPiazza
  • 10,029
  • 11
  • 83
  • 152