1

I want to match the following markup

<text link="no">
    ...
</text>

The only thing important is that I want to match the text element with link="no" as an attribute and whatever is inside.

I'm using python and currently I have the following regex which is not working:

'<text [^<]*link="no"[^<]*>[.\t\n\r\xa0]*</text>[ \t\n\r\xa0]*'

So I'm considering that text could have other attributes.

Help would be much appreciated!

pypat
  • 1,096
  • 1
  • 9
  • 19
  • 5
    Why don't you use an XML parsing library? – fge Jan 09 '13 at 10:13
  • 1
    It is a SO tradition to put this link here: **You can't parse [X]HTML with regex:** http://stackoverflow.com/a/1732454/471214 – mmdemirbas Jan 09 '13 at 10:20
  • okay, so I guess I'll have to use a parsing lib....lets see what pyparsing has to offer...Thx for your responses – pypat Jan 09 '13 at 10:22

1 Answers1

0

Use XML Parser (like libxml2 or lxml or py-dom-xpath) and XPath like:

//text[@link="no"]
Ria
  • 10,237
  • 3
  • 33
  • 60