0

I have a List of URLs in a textfile which i want to visit using the C# webBrowser class and save the content of every website to somewhere. The problem is, that the program doesn't always visit the new URL.

Link 1 and 2 is visited correctly, then the browser window doesn't refresh on link 3. Link 4 works again, while 5, 6 and 7 fails. Link 8 works, 9 to 15 fails. 16 Works and so on...

Here is an example list of URLs:

http://www.example.com/somefile_7.html*SomeOtherText1*SomeAdditionalText1

http://www.example.com/somefile_12.html*SomeOtherText1*SomeAdditionalText2

static int counter_getURL = 0;

private void Form1_Load(object sender, EventArgs e)
{
    nextTurn();
}

void startBrowser(string url)
{
    webBrowser1.Navigate(new Uri(url), "_self");
    webBrowser1.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(get_browser_string);
}

void get_browser_string(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    // Display the content of the website in textBox1
    textBox1.Text = webBrowser1.Document.Body.InnerText;
    MessageBox.Show("Next");
    nextTurn();
}

public void nextTurn()
{
    startBrowser(getURL());
}

public string getURL()
{
    string url = "";
    string[] input = System.IO.File.ReadAllLines(@"C:\Users\WORKSTATION01\Desktop\url_list.txt", Encoding.Default);
    // Get the URL only
    string[] splitted = input[counter_getURL].Split(new char[] { '*' });
    url = splitted[0];
    counter_getURL++;
    return url;
} 
Syon
  • 7,205
  • 5
  • 36
  • 40
busch
  • 44
  • 6

1 Answers1

1

DocumentCompleted also fires for FRAMEs inside a webpage. My guess is that some webpages of your URLs have FRAMEs and that interferes with your code.

nim
  • 384
  • 2
  • 14
  • To be more specific, it a webpage contains a FRAME, DocumentCompleted will be fired two times - first for the outer webpage then 2nd for the FRAME. The two fires may be very close so that nextTurn is called in a very short time period - thus you may have the illusion that the 2nd URL is not shown. – nim Aug 22 '13 at 04:35