Regex: Skip/Ignore pattern

Question

Given that the following string is embedded in text, how can I extract the whole line but not matching on the inner "<" and ">"?

<test type="yippie<innertext>" />

EDIT:
Being more specific, we need to handle both use cases below where "type" has or does not have "<" and ">" chars.

<h:test type="yippie<innertext>" />
<h:test type="yippie">

Group 1: 'h:test'
Group 2: ' type="yippie<innertext>" '  -or-  ' type="yippie"'   (ie, remaining content before ">" or "/>")

So far, I have something like this, but it's a little off how it Group 2 stops at the first ">". Tweaking first part of Group 2's condition.

(<([a-zA-Z0-9_:-]+)([^>"]*|[^>]*?)\s*(/)?>)

Thanks for your help.

Mark Byers · Answer 1 · 2010-04-23T05:56:29.220

2

Try this:

<([:\w]+)(\s(?:"[^"]*"|[^/>"])+)/?>

Example usage (Python):

>>> x = '<h:test type="yippie<innertext>" />'
>>> re.search('<([:\w]+)(\s(?:"[^"]*"|[^/>"])+)/?>', x).groups()
('h:test', ' type="yippie<innertext>" ')

Also note that if your document is HTML or XML then you should use an HTML or XML parser instead of trying to do this with regular expressions.

edited Apr 23 '10 at 05:56

answered Apr 22 '10 at 23:18

Mark Byers

811,555
193
1,581
1,452

Yep, you're on it. I should have been more clear and complete. I need to group the matching splitting the tag name and the remaining lot. See above. – cwall Apr 23 '10 at 04:03

score 0 · Answer 2 · edited May 23 '17 at 11:48

0

It looks like you are trying to parse XML/HTML with a regex. I would say that your approach is fundamentally wrong. A sufficiently advanced regex is not indistinguishable from an XML parser. After all, what if you needed to parse:

<test type="yippie<inner\"text\"_with_quotes,_literal_slash_and_quote\\\">" />

Furthermore, you probably need to escape the inner < and > as < and >

For further reasons why you should not parse XML with a regex, I can only yield to this superior answer:

RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 11:48

Community

1
1

answered Apr 23 '10 at 04:20

eaolson

14,717
7
43
58

I wish I could. Existing implementation forces my hand. – cwall Apr 23 '10 at 04:32

Regex: Skip/Ignore pattern

2 Answers2