0

I am interested in checking the content of a website, the content changes frequently and when I view the website on any browser, it refreshes itself every 30 seconds. I want to know when the content has changed.

I am using winforms and I want to just click a button to start a loop, every 30 seconds. I don't want to hit the website too frequently, in fact the web pages own refresh is more than enough for my needs.

My code works when I click a button (btnCheckWebsite), if I wait a minute and then click btnCheckWebsite again, my message box pops up because the web page has changed. This is great however I want to do this in a while loop. When I un-comment my while loop, the DocumentText never changes. I have debugged it and for some reason it's the same text every time, even when the web page has changed in the real world, it stays the same in my code.

So my question is why can't I use a loop and what can I do instead to run this repeatedly without any input from me?

As a bonus, I would like to remove the .Refresh() I added this because it won't work without it however as I understand it, this refreshes the whole page. When I use a browser I see the page updating even when I don't refresh the whole page.

Just for background info, I did start by having a WebBrowser control on my form, the page refreshes automatically. I used the same code and have the same problem, interestingly, the WebBrowser control on my windows form refreshes by itself no problem, until I click btnCheckWebsite and then it stops refreshing! Also I know about webrequest but I don't know how to use it for my purposes.

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.Threading;

namespace Check_Website
{
    public partial class Form1 : Form
    {
        public WebBrowser _memoryWebBrowser = new WebBrowser();
        String _previousSource = "emptySource";

        public Form1()
        {
            InitializeComponent();

           _memoryWebBrowser.Navigate(new Uri("http://www.randomurl.com/"));

        }

        private void btnCheckWebsite_Click(object sender, EventArgs e)
        {
            //I want to un-comment this while loop and let my code run itself but it stops working
            //when I introduce my while loop.

            //while (1 < 2 )
            //{
                //Thread.Sleep(30000);

                checkWebsite();

            //}
        }

        private void checkWebsite()
        {
            //Why do I need this refresh? I would rather not have to hit the web page with a refresh.
            //When I view the webpage it refreshed with new data however when I use a WebBrowser
            //the refresh just doesn't happen unless I call Refresh.
            _memoryWebBrowser.Refresh();

            Thread.Sleep(500);

            while (((_memoryWebBrowser.ReadyState != WebBrowserReadyState.Complete) || (_memoryWebBrowser.DocumentText.Length < 3000)))
            {
                Thread.Sleep(1000);
            }


            String source = _memoryWebBrowser.DocumentText;

            if ((source != _previousSource) && (_previousSource != "emptySource"))
            {
                //Hey take a look at the interesting new stuff on this web page!!
                MessageBox.Show("Great news, there's new stuff on this web page www.randomurl.co.uk!!" );
            }

            _previousSource = source;

        }
    }
}
Ewan
  • 541
  • 8
  • 23
  • Clarify whether your page uses AJAX or DHTML to update itself dynamically and you want to track these changes. – noseratio Aug 23 '13 at 09:30
  • **[UPDATE]** we clarified the page indeed updates itself dynamically, and the solution was to use `Document.Document.Body.OuterHtml` to track updates. – noseratio Aug 23 '13 at 10:34

4 Answers4

1

You'd need to do your processing upon DocumentCompleted event. This event is asynchronous, so if you want to do this in a loop, the execution thread must pump messages for this event to fire. In a WinFroms app, your UI thread is already pumping messages in Applicaiton.Run, and the only other endorsed way to enter nested message loop on the same thread is via a modal form (here's how it can be done, see in the comments). Another (IMO, better) way of doing such Navigate/DocumentCompleted logic without a nested message loop is by using async/await, here's how. In the classic sense, this is not exactly a loop, but conceptually and syntactically it might be exactly what you're looking for.

Community
  • 1
  • 1
noseratio
  • 59,932
  • 34
  • 208
  • 486
  • This would work but as with the other 3 answers so far, it is really just a different way to load the web page repeatedly. Ideally I want to load the web page once and then check the small changes that happen within the web page. The web page has a control with changing content which I want to see changing. In a browser the content changes without reloading the web page, that's what I want to do programmatically. – Ewan Aug 23 '13 at 08:53
  • As I get it now, your page uses AJAX or DHTML to update itself dynamically - correct me if I'm wrong. If so, you should have made it clear in your question. Anyway, in this case you only need to handle `DocumentCompleted` once. Then don't use `DocumentText`, but use `Document.Document.Body.OuterHtml` to track dynamic changes. There may be better ways of handling it, like [this](http://stackoverflow.com/questions/8733306/detecting-dom-change-events). – noseratio Aug 23 '13 at 09:28
  • 1
    PERFECT! All I needed to do was use _memoryWebBrowser.Document.Body.OuterHtml; instead of _memoryWebBrowser.DocumentText; in my solution! Yes the page does use AJAX or similar to update only a small amount of content. When I said it refreshes, I mean only one control refreshes a small amount of content, the whole page does not reload. – Ewan Aug 23 '13 at 09:38
  • 1
    I have now commented out my refresh //_memoryWebBrowser.Refresh(); and it still works perfectly which proves that this is loading the page once and then finding the changes without refreshing or reloading the page. Thanks again! – Ewan Aug 23 '13 at 10:28
0

You can catch the WebBrowser.Navigated Event to get notified when the page was reloaded. So you wouldn't need a loop for that. (I meant the ready loop)

Just navigate every 30 seconds to the page in a loop and in the Navigated Event you can check whether the site has changed or not.

Pixelmonster
  • 396
  • 7
  • 15
  • Sounds good but I'm not sure that the page does actually reload. The page mostly stays static but there's a control in the page wirh changing content. I'm sure they have done it that way to show new content without a page reload. – Ewan Aug 22 '13 at 21:14
0

You'd better hook up DocumentCompleted event to check its DocumentText property!

nim
  • 384
  • 2
  • 14
  • That would work when the DocumentCompleted but that only happens once. How could I repeatedly check the differences? – Ewan Aug 22 '13 at 21:10
  • After checking diferences in Documentcomplete, call _memoryWebBrowser.Refresh(); – nim Aug 23 '13 at 08:03
  • How do I make it loop and if I do make it loop, how do I know that it is not reloading the page? Perhaps I don't understand, do you have an example? – Ewan Aug 23 '13 at 09:18
0

The WebBrowser Element is very buggy and has much overhead for your needs. Instead of that you should use WebRequest. Because you said you don't know how to use, here's an (working) example for you.

using System;
using System.Windows.Forms;
using System.Net;
using System.IO;

namespace Check_Website
{
    public partial class Form1 : Form
    {
        String _previousSource = string.Empty;
        System.Windows.Forms.Timer timer;

        private System.Windows.Forms.CheckBox cbCheckWebsite;
        private System.Windows.Forms.TextBox tbOutput;

        public Form1()
        {
            InitializeComponent();

            this.cbCheckWebsite = new System.Windows.Forms.CheckBox();
            this.tbOutput = new System.Windows.Forms.TextBox();
            this.SuspendLayout();
            // 
            // cbCheckWebsite
            // 
            this.cbCheckWebsite.AutoSize = true;
            this.cbCheckWebsite.Location = new System.Drawing.Point(12, 12);
            this.cbCheckWebsite.Name = "cbCheckWebsite";
            this.cbCheckWebsite.Size = new System.Drawing.Size(80, 17);
            this.cbCheckWebsite.TabIndex = 0;
            this.cbCheckWebsite.Text = "checkBox1";
            this.cbCheckWebsite.UseVisualStyleBackColor = true;
            // 
            // tbOutput
            // 
            this.tbOutput.Location = new System.Drawing.Point(12, 35);
            this.tbOutput.Multiline = true;
            this.tbOutput.Name = "tbOutput";
            this.tbOutput.Size = new System.Drawing.Size(260, 215);
            this.tbOutput.TabIndex = 1;
            // 
            // Form1
            // 
            this.ClientSize = new System.Drawing.Size(284, 262);
            this.Controls.Add(this.tbOutput);
            this.Controls.Add(this.cbCheckWebsite);
            this.Name = "Form1";
            this.Load += new System.EventHandler(this.Form1_Load);
            this.ResumeLayout(false);
            this.PerformLayout();

            timer = new System.Windows.Forms.Timer();
            timer.Interval = 30000;
            timer.Tick += timer_Tick;
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            timer.Start();
        }

        void timer_Tick(object sender, EventArgs e)
        {
            if (!cbCheckWebsite.Checked) return;

            WebRequest request = WebRequest.Create("http://localhost/check_website.html");
            request.Method = "GET";

            WebResponse response = request.GetResponse();

            string newContent;
            using (var sr = new StreamReader(response.GetResponseStream()))
            {
                newContent = sr.ReadToEnd();
            }

            tbOutput.Text += newContent + "\r\n";

            if (_previousSource == string.Empty)
            {
                tbOutput.Text += "Nah. It's empty";
            }
            else if (_previousSource == newContent)
            {
                tbOutput.Text += "Nah. Equals the old content";
            }
            else
            {
                tbOutput.Text += "Oh great. Something happened";
            }

            _previousSource = newContent;
        }
    }
}
Pixelmonster
  • 396
  • 7
  • 15
  • This works very well and I agree there is an overhead in using WebBrowser. That overhead affects my windows form though which doesn't worry me. The only small problem though is that the request.GetResponse(); in this solution loads the whole web page again. It's the same as the .Refresh(); in my solution which I'd like to avoid doing every 30 seconds. Really it is the content that I want to check as it changes dynamically. 90% of the web page is static and so I don't want to reload it every time. Is there a way to do one load of the web page and then check the changing content only? – Ewan Aug 23 '13 at 08:28
  • Just to add, this a great example but the answer for me was to use _memoryWebBrowser.Document.Body.OuterHtml; instead of _memoryWebBrowser.DocumentText; That was only a 1 line change in my existing code and then I was able to comment out my _memoryWebBrowser.Refresh(); This achieved my final goal perfectly as I am loading the web page once and then without refreshing the page, I am checking the content which changes dynamically. The WebBrowser overhead is no a concern as it only happens once. The efficiency within my loop is more important i.e. not using Refresh or GetResponse() to reload. – Ewan Aug 23 '13 at 10:07
  • In your question you said the content of the website is changing frequently. This can be taken as a) somebody changes the code or b) you have AJAX (or similar) on the site and just a little bit of content changes. So I thought you would like to get the whole page every 30 seconds and compare it with the prior one. Not a bit of it! Nice to see that you have solved it yourself. – Pixelmonster Aug 24 '13 at 00:22