0

I don't understand what may cause this error

I am using the below function with about 1000 concurrent connections

Each connection uses a different webproxy

After a while like 15 minutes working, the established TCP connection count starts to stack and internet connectivity becomes lost

When i do not use any webproxy, i do not encounter any error

I am using below function to retrieve active TCP connections count

var properties = IPGlobalProperties.GetIPGlobalProperties();

I don't see any leak in my function

So i need your help to solve this annoying problem

c# .net 4.6.2

Here the statuses of active TCP Connections when this problem occurs

enter image description here

public static cs_HttpFetchResults func_fetch_Page(
    string srUrl, int irTimeOut = 60,
    string srRequestUserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0",
    string srProxy = null, int irCustomEncoding = 0, bool blAutoDecode = true, bool blKeepAlive = true)
{
    cs_HttpFetchResults mycs_HttpFetchResults = new cs_HttpFetchResults();
    mycs_HttpFetchResults.srFetchingFinalURL = srUrl;

    HttpWebRequest request = null;
    WebResponse response = null;

    try
    {
        request = (HttpWebRequest)WebRequest.Create(srUrl);
        request.CookieContainer = new System.Net.CookieContainer();

        if (srProxy != null)
        {
            string srProxyHost = srProxy.Split(':')[0];
            int irProxyPort = Int32.Parse(srProxy.Split(':')[1]);
            WebProxy my_awesomeproxy = new WebProxy(srProxyHost, irProxyPort);
            my_awesomeproxy.Credentials = new NetworkCredential();
            request.Proxy = my_awesomeproxy;
        }
        else
        {
            request.Proxy = null;
        }

        request.ContinueTimeout = irTimeOut * 1000;
        request.ReadWriteTimeout = irTimeOut * 1000;
        request.Timeout = irTimeOut * 1000;
        request.UserAgent = srRequestUserAgent;
        request.KeepAlive = blKeepAlive;
        request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";

        WebHeaderCollection myWebHeaderCollection = request.Headers;
        myWebHeaderCollection.Add("Accept-Language", "en-gb,en;q=0.5");
        myWebHeaderCollection.Add("Accept-Encoding", "gzip, deflate");

        request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;

        using (response = request.GetResponse())
        {
            using (Stream strumien = response.GetResponseStream())
            {
                Encoding myEncoding = Encoding.UTF8;
                string srContentType = "";

                if (response.ContentType != null)
                {
                    srContentType = response.ContentType;
                    if (srContentType.Contains(";"))
                    {
                        srContentType = srContentType.Split(';')[1];
                    }
                    srContentType = srContentType.Replace("charset=", "");
                    srContentType = func_Process_Html_Input(srContentType);
                }

                try
                {
                    myEncoding = Encoding.GetEncoding(srContentType);
                }
                catch
                {
                    myEncoding = irCustomEncoding == 0 ? Encoding.UTF8 : Encoding.GetEncoding(irCustomEncoding);
                }

                using (StreamReader sr = new StreamReader(strumien, myEncoding))
                {
                    mycs_HttpFetchResults.srFetchBody = sr.ReadToEnd();
                    if (blAutoDecode == true)
                    {
                        mycs_HttpFetchResults.srFetchBody = HttpUtility.HtmlDecode(mycs_HttpFetchResults.srFetchBody);
                    }
                    mycs_HttpFetchResults.srFetchingFinalURL = Return_Absolute_Url(response.ResponseUri.AbsoluteUri.ToString(), response.ResponseUri.AbsoluteUri.ToString());
                    mycs_HttpFetchResults.blResultSuccess = true;
                }
            }
        }

        if (request != null)
            request.Abort();
        request = null;
    }
    catch (Exception E)
    {
        if (E.Message.ToString().Contains("(404)"))
            mycs_HttpFetchResults.bl404 = true;

        csLogger.logCrawlingErrors("crawling failed url: " + srUrl, E);

    }
    finally
    {
        if (request != null)
            request.Abort();
        request = null;

        if (response != null)
            response.Close();
        response = null;
    }

    return mycs_HttpFetchResults;
}
caesay
  • 16,932
  • 15
  • 95
  • 160
Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342
  • My main advice here would be to not use `WebRequest` and move to the newer `HttpClient` instead. If nothing else, it supports `IDisposable` – DavidG Mar 21 '17 at 00:41
  • (a) you'll never get high throughput using sychronous APIs. (b) when using async APIs, be aware that the DNS lookup phase is synchronous. This can be mitigated ( http://stackoverflow.com/a/26050285/14357 ) with 3rd party DNS library ( maybe https://arsofttoolsnet.codeplex.com/ ) (c) from my experience with crawling, the most likely failure point (once you're absolutely sure your code is working) is at the router. Home routers tend to be woefully inadequate for high volume crawling, and when they run out of memory, connections just vanish. – spender Mar 21 '17 at 00:46
  • @spender it works perfectly fine for like 15 20 minutes. after that some weird bug errors. i am sure some errors happens because number of established connections starts to stack up. i am clueless here :( my home router is pretty good. the bandwidth never reaches maximum. I will try with providing IP host. maybe it is about DNS. – Furkan Gözükara Mar 21 '17 at 00:48
  • @DavidG i have checked. it doesnt support proxy :( – Furkan Gözükara Mar 21 '17 at 00:48
  • It absolutely does support proxies. http://stackoverflow.com/a/24677189/1663001 – DavidG Mar 21 '17 at 00:50
  • @MonsterMMORPG Bandwidth wasn't the issue for me. My first attempt at this sort of thing, the router literally ran out of memory when servicing a large number of connections. Maybe it's the same for you. – spender Mar 21 '17 at 00:51
  • @spender are there anyway i can determine whether that is the issue or not? – Furkan Gözükara Mar 21 '17 at 00:55
  • @DavidG ty very much. will check it. seems like it supports all features of HttpWebRequest right? – Furkan Gözükara Mar 21 '17 at 00:55
  • It supports everything and more. – DavidG Mar 21 '17 at 00:56
  • HttpClient (in a normal windows environment) going to lean on HttpWebRequest for all the lifting, but I agree with @DavidG. Stacking HttpMessageHandlers/DelegatingHandlers in HttpClient is a great way to customize the HttpClient behaviour. It also simplifies the async code, which you'll want to switch to. – spender Mar 21 '17 at 00:58
  • Can you log into your router and view any stats? Does it offer an SSH/telnet login where you can issue any commands to view stats? – spender Mar 21 '17 at 00:59
  • @DavidG i have converted it into HttpClient can you check please? http://codereview.stackexchange.com/questions/158359/trying-to-compose-best-web-page-fetcher-function-by-httpclienthandler-for-c – Furkan Gözükara Mar 21 '17 at 01:57
  • @spender my router is Huawei HG253/S. I guess it doesnt support such stats. It is fiber optic router – Furkan Gözükara Mar 21 '17 at 01:58
  • @spender i will further test whether the problem is router or not hopefully tomorrow. it is 6am here. meanwhile can you check this? http://stackoverflow.com/questions/42917528/how-to-prevent-dns-lookup-when-fetching-by-using-httpclient – Furkan Gözükara Mar 21 '17 at 02:48
  • @DavidG can you check this? i want to prevent DNS look up : http://stackoverflow.com/questions/42917528/how-to-prevent-dns-lookup-when-fetching-by-using-httpclient – Furkan Gözükara Mar 21 '17 at 02:48
  • It looks like your code is always going to Abort rather than attempt to Close. Check the documentation at https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.connection(v=vs.110).aspx – Mokubai Mar 21 '17 at 16:00
  • Even with an Abort you still need to actually close the connection: https://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.abort(v=vs.110).aspx – Mokubai Mar 21 '17 at 16:04
  • @Mokubai the httpwebrequest does not have anything like close. Also i am using using clause for response. response does have close but since it is wrapped by using clause it should be handled properly. or not? – Furkan Gözükara Mar 21 '17 at 18:36

0 Answers0