1

I have following xml file:

<ab>
 <![CDATA[ 

    <table>
        <tbody>
            <tr>
                     <th>abcdef</th>    
                           <th><a href="/1/2" target="_blank">Contact</a></th>          
            </tr>
            <tr>
             <p>
              <a href="/1/2" target="_blank">Home</a>
             </p>
            </tr>
        </tbody>
    </table>
 ]]>
</ab>

I am still learning linq. Want to know if there is an easier way to find all a href = "/1/2/" tags inside cdata and remove them. Like in above example it should just show Contact and Home and remove the

NoviceMe
  • 3,126
  • 11
  • 57
  • 117
  • there is a large debate on using regex to parse html summed up by this post here http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 uf you have a moment take a read its quite interesting. – krystan honour Apr 03 '12 at 16:45

2 Answers2

1
void Main() 
{ 
    XDocument doc = XDocument.Load("C:\\test.xml");
    XDocument xdoc = XDocument.Parse(doc.ToString());

    XNode node = xdoc.DescendantNodes().Single(x => x.NodeType == XmlNodeType.CDATA);

    if (node.Parent != null)
    {
        string content = node.Parent.Value.Trim();
        IEnumerable<XElement> elements =
            XDocument.Parse(content).Descendants().Where(x =>
                                                             {
                                                                 XAttribute xAttribute = x.Attribute("href");
                                                                 return
                                                                     xAttribute !=
                                                                         null && xAttribute.Value == "/1/2";
                                                             });

        // do something here
    }
}

contents of test.xml is

<ab> 
 <![CDATA[  

    <table> 
        <tbody> 
            <tr> 
                <th>abcdef</th>     
                <th><a href="/1/2" target="_blank">Contact</a></th>           
            </tr> 
            <tr> 
             <p> 
              <a href="/1/2" target="_blank">Home</a> 
             </p> 
            </tr> 
        </tbody> 
    </table> 
 ]]> 
</ab>
krystan honour
  • 6,523
  • 3
  • 36
  • 63
  • 1
    of course if whats in cdata is horridly formed its not going to load into xdocument, but in this case it isn't so it does :) – krystan honour Apr 03 '12 at 16:49
0

I don't think LINQ is the best way to go about this problem. Personally, I would use Regular Expression. Here is an example of how this could be done:

Example: Scanning for HREFs

In general, if you are doing any more intensive HTML processing, using an HTML parser is probably the best way to go, such as HtmlAgilityPack.

Regex sample code:

Regex hrefRegex = new Regex(@"href=""([^""]*"")", RegexOptions.IgnoreCase |   RegexOptions.Compiled);
string output = hrefRegex.Replace(input, new MatchEvaluator(m => string.Empty));

Hope this helps,

Ivan

Ivan Osmak
  • 126
  • 2
  • I am not at all good with regular expressions. Can you please tell me how can i remove the a href tag for above problem using regex? – NoviceMe Apr 03 '12 at 15:53
  • @NoviceMe the example provided by Ivan is pretty straightforward and is a good base to start researching your solution. – Slugart Apr 03 '12 at 16:01
  • @Ivan - I dont need to look for all href i need to look for exactly href="/1/2". So is this correct way to do it? (@"href=""([^/1/2])" – NoviceMe Apr 03 '12 at 17:21
  • href=""/1/2"" - here is a link to a great tool for working with regex so you can experiment http://www.ultrapico.com/Expresso.htm – Ivan Osmak Apr 03 '12 at 17:34
  • first of all i have to say that expresso is a great tool, if you want to use linq i've done what you want above. – krystan honour Apr 03 '12 at 18:17
  • Yeah, completely agree with krystan. Both approaches will work well and if that's all there is to the scenario I think both approaches will work equally well. – Ivan Osmak Apr 03 '12 at 18:25