1

I'm using the following code to Validate URLs

private Boolean CheckURL(string url)
{
    using (MyClient myclient = new MyClient())
    {
        try
        {
            myclient.HeadOnly = true;
            // fine, no content downloaded
            string s1 = myclient.DownloadString(url);
            statusCode = null;
            return true;
        }
        catch (WebException error)
        {
            if (error.Response != null)
            {
                HttpStatusCode scode = ((HttpWebResponse)error.Response).StatusCode;

                if (scode != null)
                {
                    statusCode = scode.ToString();
                }
            }
            else
            {
                statusCode = "Unknown Error";
            }
            return false;
        }
    }
}

class MyClient : WebClient
{
    public bool HeadOnly { get; set; }

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest req = base.GetWebRequest(address);
        // req.Timeout = 3000;
        if (HeadOnly && req.Method == "GET")
        {
            req.Method = "HEAD";
        }
        return req;
    }
}

This works fine for most of the cases,but for some URLs it returns False Positive Results. For Valid URLs(when I browse using chrome) the method Returns Not Found. Also for some URLs this method takes too much time to process.

What I'm doing wrong? Please advice..

UPDATE:

I'm checking the URLs from Multiple threads using Parallel,does this cause the problem?

 public void StartThreads()
        {

            Parallel.ForEach(urllist, ProcessUrl);

        }
        private void ProcessUrl(string url)
        {
            Boolean valid = CheckURL(url);

            this.Invoke((MethodInvoker)delegate()
            {
            if (valid)
            {
                //URL is Valid 


            }
            else
            {
                //URL is Invalid 

            }

            });
        }

I'm starting the threads from a BackGround Worker to prevent UI Freezing

 private void worker_DoWork(object sender, DoWorkEventArgs e)
        {
            StartThreads();
        }
max
  • 11
  • 3
  • 1
    A valid url returning "not found" would be a false negative... but: this is impossible to diagnose without a lot more context and ideally data from an http capture (fiddler, wireshark, etc) to see what actually got sent. Note that if you're running too many requests against the same provider, they might choose to block you *in any way they choose* as a defence against DOS attacks (or just misbehaving crawlers); it could also be a difference of opinion on how URLs are formed - especially with unicode, what you see in a browser is *not* always the actual URL – Marc Gravell Jul 19 '17 at 08:57
  • @MarcGravell Thanks for your response.I'm running the code from a Desktop Application so IP Blocking(ie:multiple users >same resource ) is not an issue.When i type the URL in the Browser i can view the contents of the Page Without any issue.. – max Jul 19 '17 at 09:17
  • Instead of pasing the url directly as string, try passing `HttpUtility.UrlEncode(url)` instead. – LocEngineer Jul 19 '17 at 09:21
  • "so IP Blocking(ie:multiple users >same resource ) is not an issue" - um, desktop applications aren't immune from that, and ... well, I can only speak for myself, but : I'd still automatically block you if you if you were hitting us too often - although I *might* use 429 if we felt kind... or more likely: 418 – Marc Gravell Jul 19 '17 at 09:22
  • @LocEngineer Do you mean `myclient.DownloadString(HttpUtility.UrlEncode(url))`? – max Jul 19 '17 at 09:26
  • Yes. See if that makes a difference (e.g. spaces or special characters in URL). – LocEngineer Jul 19 '17 at 09:28
  • @LocEngineer This results in all URLs being Invalid. – max Jul 19 '17 at 09:43
  • @MarcGravell Please see the update. – max Jul 19 '17 at 10:32
  • @LocEngineer Please see the update. – max Jul 19 '17 at 10:32
  • If you want to not lock the UI you'd be better off using `async`/`await` instead of `Parallel.ForEach` [Async, await and parallel in C#](https://stackoverflow.com/questions/14099520/async-await-and-parallel-in-c-sharp) – Liam Jul 19 '17 at 10:38

0 Answers0