Replacing a line of HTML. I was told to not use RegEx, what do I use?

Question

I'm just trying to do a simple deletion of an element in C#. If my html element contains the text [Store Logo] then I want to remove it. Example:

<img src="http://src.sencha.io/300/80/http://images.company.com/[Store Logo]" />

Since it has [Store Logo] then I'd like to delete the whole image tag. I was trying to use RegEx to do it but it's hard to understand how to use all the symbols together and I read that I'm not supposed to use regex to parse html. What is the best way to remove it?

Assuming it's (valid) XML, you can load this into XDocument, then search for the attribute which contains [store logo] and then just remove that element. — Dave, Oct 09 '13 at 19:35
You sure are not supposed to use [regex to parse HTML](http://stackoverflow.com/a/1732454/2777674) — fvdalcin, Oct 09 '13 at 19:37
@DaveRook it is HTML, I wouldn't assume it is valid XML. Use a HTML parser instead. — Bart Friederichs, Oct 09 '13 at 19:42
@BartFriederichs, that assumption would generally be bad.
doesn't have an end tag among others — Harrison, Oct 09 '13 at 19:55

score 3 · Accepted Answer · answered Oct 09 '13 at 19:44

U can use Html Agility Pack

Here's an example straight from their examples page on how to find all the links in a page:

 HtmlWeb hw = new HtmlWeb();
 HtmlDocument doc = hw.Load(/* url */);
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href]"))
 {
    if(link.Attributes["href"].Value.Contains("[Store Logo]"))
       link.ParentNode.RemoveChild(link, true);
 }

score 0 · Answer 2 · answered Oct 09 '13 at 19:42

0

Use HtmlAgilityPack. It's a library for parsing HTML that allows to to access the DOM and modify it.

answered Oct 09 '13 at 19:42

System Down

6,192
1
30
34

Replacing a line of HTML. I was told to not use RegEx, what do I use?

2 Answers2