0

I have following xml file:

<ab>
 <![CDATA[ 

    <table>
        <tbody>
            <tr>
                <th>abcdef</th>             
            </tr>
            <tr>
             <p>
              <a href="/1/2" target="_blank">Home</a>
             </p>
            </tr>
        </tbody>
    </table>
 ]]>
</ab>

I want to remove the a href tag which has link of 1/2. Like in above example i want to remove the a href link using xpath and leave only text: Home.

NoviceMe
  • 3,126
  • 11
  • 57
  • 117

1 Answers1

1

CDATA is just a string of arbitrary text until you process it.

So:

  1. Extract the textNode child of <ab>
  2. Run it through an HTML parser
  3. Run XPath on the output from the parser
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • is it worth doing all this? Or should i just use regular expression to look for that particular href tag? – NoviceMe Apr 03 '12 at 14:52
  • On [parsing HTML with regular expressions](http://stackoverflow.com/a/1732454/19068) – Quentin Apr 03 '12 at 14:53
  • @Quentin- What if there are more nodes in xml that have same href tag that i want? What is the best way to do that? Sorry for keep on bugging you. – NoviceMe Apr 03 '12 at 14:59
  • A loop. (It would probably better to have the HTML as namespaced XHTML instead of text in the first place). – Quentin Apr 03 '12 at 15:01
  • One last question i am not sure how to follow 3 steps you mentioned above do you have any example code? – NoviceMe Apr 03 '12 at 15:08