2

I’m using the SyndicationFeed class to consume some rss feeds for articles. I wonder how to get only the text from the item's Summary field, without the html tags. for example, sometimes (not always) it contains html tags such as: div, img, h, p tags:/a>/div> ,img src='http"

I want to get rid of all tags. Also, I'm not sure it brings the full description within the RSS feed.

Should I use regular expression for this matter? other methods?

XmlReader reader = XmlReader.Create(response.GetResponseStream());

SyndicationFeed feed = SyndicationFeed.Load(reader);

foreach (SyndicationItem item in feed.Items)
{

     string description= item.Summary;  //This contains tags and not only the article text

}
SunShine
  • 57
  • 9

1 Answers1

3

Yeah I suppose regexes are the easiest built-in way to achieve this...

// Get rid of the tags
description = Regex.Replace(description, @"<.+?>", String.Empty);

// Then decode the HTML entities
description = WebUtility.HtmlDecode(description);
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158