Regex to match specific xml - section in Python

Question

I want to match the following markup

<text link="no">
    ...
</text>

The only thing important is that I want to match the text element with link="no" as an attribute and whatever is inside.

I'm using python and currently I have the following regex which is not working:

'<text [^<]*link="no"[^<]*>[.\t\n\r\xa0]*</text>[ \t\n\r\xa0]*'

So I'm considering that text could have other attributes.

Help would be much appreciated!

It is a SO tradition to put this link here: **You can't parse [X]HTML with regex:** http://stackoverflow.com/a/1732454/471214 — mmdemirbas, Jan 09 '13 at 10:20
okay, so I guess I'll have to use a parsing lib....lets see what pyparsing has to offer...Thx for your responses — pypat, Jan 09 '13 at 10:22

Ria · Accepted Answer · 2013-01-09T10:29:09.923

0

Use XML Parser (like libxml2 or lxml or py-dom-xpath) and XPath like:

//text[@link="no"]

edited Jan 09 '13 at 10:29

answered Jan 09 '13 at 10:23

Ria

1 Answers1