i need to build a web scanner. this web application must scan any webpage and save result if some data has been changed. it should to search for key words and seek if their values has been modifyed/changed. i will create this application with asp.net mvc. what should i use to scan some webpage? if i will insert in my page any url of page which i will to scan, what should happens? are they some robots which looking for it if some content changes? please help me to understand what i need to build such thing.
Asked
Active
Viewed 512 times
0
-
Change? Compared to when? Save? To Where? – spender Jan 20 '11 at 11:06
-
change between yesterday and today. save in some database. i mean what should i do for that. implement some robot as application which reads webpage for specific user given key's? – r.r Jan 20 '11 at 11:15
1 Answers
1
You could load the page's markup, use it to generate a checksum and then store this away ready to compare with the next day's page.

immutabl
- 6,857
- 13
- 45
- 76
-
1True, but what if some content ("today's date") changes naturally? What about things like viewstate that you don't care about? – Hans Kesting Jan 20 '11 at 13:59
-
1Agreed. These are considerations for whoever it is is defining the requirements. I'm merely giving the OP an overview of a possible technical solution. @Ragim you need to define what exactly constitutes a 'change' and build this understanding into the the logic used to load some or part of the markup. Some may suggest you use RegExp to parse the HTML and use this to include/exclude irrelevant parts of the page like dates in headers etc. This is not recommended: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags Good luck. – immutabl Jan 20 '11 at 14:07