0

A link to the first question can be found here:

Using VB.NET to Detect Changes in a Web Page

I did an edit on it, but was told to resubmit as a new question. But using the link above you can get a general idea. Carrying over the edit below. Thank you!

New twist on this question sorry. I had more time to think about what we wanted. So... Detecting ANY change on a web page would be kind of silly since time dependent elements of the page would change every so often. Instead, what I would like to do is be able to detect the documents in the page. For instance if there are excel, word docs, or pdfs that get changed on that page. So, I'd run the hash on these documents then on some sort of schedule do a check to see if new documents have been added or if the old documents have been modified. Any suggestions on how to detect the documents embedded on the page and running the hash? Thanks again!

Community
  • 1
  • 1
New Guy
  • 566
  • 1
  • 8
  • 28

1 Answers1

0

I'll start with a piece of meta-advice: when asking questions whose answer is likely to depend on .NET itself, or programming more generally, use tags that say so, and don't use a tag like VB.NET, because most of the .NET community uses C#, and they will often not see it.

About your actual question, the specifics would depend on exactly what you need to check, but in general, it sounds like you need to define regions of interest within the page, identified by, say, a css selector. So suppose the page you're watching has a little list of documents, and that list is coded like this:

<p>New this week!</p>
<ul class="new-docs">
  <li><a href="...">Some Doc</a></li>
  <li><a href="...">Some Other Doc</a></li>
</ul>

So you write some code to download this page and extract the element with the selector ul.new-docs and then test it for changes, either by using a hash/checksum on the whole block of HTML, or by explicitly recording each of the child items and comparing the new list with the old.

You might find this thread helpful for actually extracting given bits of HTML by selector once you've downloaded the page.

Community
  • 1
  • 1
Joshua Frank
  • 13,120
  • 11
  • 46
  • 95
  • Thanks for the info. I figured this out a while back. I need to update it with the answer. Thanks again! – New Guy Jan 21 '14 at 13:32