3

Basically I want to extract keywords or words or tokens that are present in the webpage after removing the stopwords. Does anybody know how to do this? Code in C# would be appreciated.

Wayne Koorts
  • 10,861
  • 13
  • 46
  • 72
jaskirat
  • 39
  • 1
  • 7

2 Answers2

0

Use an HTML parsing library like the HTML Agility Pack.

Once you load an HTML document with it, you can query it with Xpath syntax - it exposes the HTML in a similar way to an XmlDocument.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
0

The HTML Agility Pack that Oded mentions will help you get at the plain text inside the HTML, but to extract keywords from the webpage after removing the stopwords you'll need to do more work. There's a good informative answer from Joseph Turian to this question: How do I extract keywords used in text?

Community
  • 1
  • 1
dumbledad
  • 16,305
  • 23
  • 120
  • 273