1

ASP.NET 4 & C# and

I would like to know which CODE, Classes could be useful for creating a WEB APPLICATION that could:

01 - Connect to an HTML file on the web.
02 - Parse its content (text content).
03 - Find out specific content in a page (for example looking for specific keywords).

Also how to implement:

04 - How to submit information programmatically in HTML page (feeling forms).

I am interested in understanding Classes and general practice and CODE for accomplish this task.

If you have any idea please let me know. Thanks guys once again for your support! :-)

GibboK
  • 71,848
  • 143
  • 435
  • 658

3 Answers3

1

I'm not sure if you want all of the things that you mention to execute 'server-side', but assuming that this is the case:

01 - Connect to an HTML file on the web.

Check out the WebClient class, and the HttpWebRequest class for more advanced scenarios.

02 - Parse its content (text content). 03 - Find out specific content in a page (for example looking for specific keywords).

You might want to look at the Html Agility Pack, or if Bobince doesn't notice, regular expressions.

04 - How to submit information programmatically in HTML page (feeling forms).

Typically, this will require sending a HTTP POST request, which too can be accomplished with the HttpWebRequest class.

Ani
  • 111,048
  • 26
  • 262
  • 307
  • @GIbboK: Sorry for not providing context, it was just a joke. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Ani Oct 22 '10 at 06:50
1

For parsing the web page, have a look at the HTML Agility pack.
For form passing, you either need to use tools like Firebug or the Internet Explorer developer tools or use a sniffer like Wireshark to see what is sent via the network.
I would also consider in your case to consider to split it into seperate components so that you can easily test parts of the process.

weismat
  • 7,195
  • 3
  • 43
  • 58
0

Use a HttpWebRequest to invoke a request to a page on the web.

You can then parse the HTML response.

To programmatically submit a form, i think you'll need to do it client-side (JavaScript):

document.forms[0].submit();
RPM1984
  • 72,246
  • 58
  • 225
  • 350