removing a href tag from cdata

Question

I have following xml file:

<ab>
 <![CDATA[ 

    <table>
        <tbody>
            <tr>
                <th>abcdef</th>             
            </tr>
            <tr>
             <p>
              <a href="/1/2" target="_blank">Home</a>
             </p>
            </tr>
        </tbody>
    </table>
 ]]>
</ab>

I want to remove the a href tag which has link of 1/2. Like in above example i want to remove the a href link using xpath and leave only text: Home.

score 1 · Accepted Answer · answered Apr 03 '12 at 14:50

1

CDATA is just a string of arbitrary text until you process it.

So:

Extract the textNode child of <ab>
Run it through an HTML parser
Run XPath on the output from the parser

answered Apr 03 '12 at 14:50

Quentin

914,110
126
1,211
1,335

is it worth doing all this? Or should i just use regular expression to look for that particular href tag? – NoviceMe Apr 03 '12 at 14:52
On [parsing HTML with regular expressions](http://stackoverflow.com/a/1732454/19068) – Quentin Apr 03 '12 at 14:53
@Quentin- What if there are more nodes in xml that have same href tag that i want? What is the best way to do that? Sorry for keep on bugging you. – NoviceMe Apr 03 '12 at 14:59
A loop. (It would probably better to have the HTML as namespaced XHTML instead of text in the first place). – Quentin Apr 03 '12 at 15:01
One last question i am not sure how to follow 3 steps you mentioned above do you have any example code? – NoviceMe Apr 03 '12 at 15:08

removing a href tag from cdata

1 Answers1