Basically I want to extract keywords or words or tokens that are present in the webpage after removing the stopwords. Does anybody know how to do this? Code in C# would be appreciated.
Asked
Active
Viewed 1,892 times
3
-
Maybe you should tag this with [c#] – brickner May 09 '10 at 20:05
-
if all you want is to get data from web page you can use JQuery like the following $('#testDIV').load('JQueryPage.aspx'); – Amr Badawy May 10 '10 at 05:59
2 Answers
0
Use an HTML parsing library like the HTML Agility Pack.
Once you load an HTML document with it, you can query it with Xpath syntax - it exposes the HTML in a similar way to an XmlDocument
.

Oded
- 489,969
- 99
- 883
- 1,009
0
The HTML Agility Pack that Oded mentions will help you get at the plain text inside the HTML, but to extract keywords from the webpage after removing the stopwords you'll need to do more work. There's a good informative answer from Joseph Turian to this question: How do I extract keywords used in text?