1

I have the following text

<pattern name="pattern1"/>
<success>success case 1</success>
<failed> failure 1</failed>
<failed> failure 2</failed>
<unknown> unknown </unknown>
<pattern name="pattern4"/>
<pattern name="pattern5"/>        
<success>success case 3</success> 
<pattern name="pattern2"/>        
<success>success case 2</success>
<otherTag>There are many other tags.</otherTag>
<failed> failure 3</failed>
<pattern name="pattern3"/> 
<unknown>unkown</unknown> 

And the regular expression <failed>[\w|\W]*?</failed> matches all the lines contains failed tag.

What do I need to to if I want all failed tags and the pattern tag above the failed tag. if there is no failed tag underneath a pattern tag, then the pattern tag should not be matched? Basically, I want the following output:

<pattern name="pattern1"/>
<failed> failure 1</failed>
<failed> failure 2</failed>
<pattern name="pattern2"/>
<failed> failure 3</failed>

I am doing this in javascript, I do not mind of doing some intermediate steps.

edit start Almost all repliers suggest me to take a different approach. I am unsure which approach I should take. JQuery, regex or others. I am giving more information here for better decision making. The data format would change, but would not change often. The data is from a schematron validition report of file type ".SVRL" The structure of the file are have the following schema defined using "RELAX NG compact syntax"

schematron-output   = element schematron-output {
attribute title { text }?,
attribute phase { xsd:NMTOKEN }?,
attribute schemaVersion { text }?,
    human-text*,
    ns-prefix-in-attribute-values*,
    (active-pattern,
    (fired-rule, (failed-assert | successful-report)*)+)+
}

the maps to active-pattern, and matches to failed-assert and successful-report respectively.

Now with additional information, which approach should I be taking? Thanks very much for helping out. :)

edit end

Michael Z
  • 4,534
  • 4
  • 21
  • 27
  • See [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) and [XML parsing in JavaScript](http://stackoverflow.com/questions/649614/xml-parsing-in-javascript). Most JavaScript environments have good support for XML parsing. You don't want to use regex. – Matthew Flaschen Jun 15 '10 at 03:37
  • On a side note, `|` doesn't mean "or" in a character class, it just matches `|`. "Or" is implicit in character classes anyway; `[\w\W]` means "a word character or a non-word character." – Alan Moore Jun 15 '10 at 05:06
  • @Matthew:thanks a lot for your suggestion, I would evaluate your option. @Alan: Thanks so much for pointing out the "|" doesn't mean "or" in character class. :) – Michael Z Jun 15 '10 at 21:17

3 Answers3

1

You should look into methods other than regular expressions to parse XML, particularly if:

  • your requirements are likely to change in future, making your regular expression increasingly unweildy
  • you are parsing data from a third-party source, which may contain just about anything, including strings that look like XML tags embedded in XML comments, CDATA sections or attributes.

See this answer for information about XML parsing in Javascript.

The easy solution is "use jQuery". If for some reason you don't want to load jQuery to do this, then start here.

Community
  • 1
  • 1
thomasrutter
  • 114,488
  • 30
  • 148
  • 167
1

You can use the regex "|" operator (meaning "or") to create a regex that will match one or more expressions. For example ...

/^<failed>[\w|\W]*?<\/failed>|^<pattern[^>]*>/

... should do what you're asking (based on the example you've given above).

But, as other commenters have said, parsing XML with regexs is a slippery slope. You'll probably want to look into other options, like using the DocumentFragment class to parse your string for you.

broofa
  • 37,461
  • 11
  • 73
  • 73
  • Thanks broofa, your answer does exactly as I wanted. I understand others' concern, but the file structure is unlikely to change in the future (I added more comments). I am inclined to use regex. – Michael Z Jun 15 '10 at 21:09
  • Hi broofa, sorry I slightly edited my question, your regular expression worked perfectly well for my original requirement, any chance you can have a at the new text to be parsed? – Michael Z Jun 16 '10 at 00:04
1

Here are the RegExp you need:

<(pattern|failed)\b[^>]*(?:/>|>[^<]*</\1>)

Just escape the slashes when using in Javascript regular expression notation:

var regExp = /<(pattern|failed)\b[^>]*(?:\/>|>[^<]*<\/\1>)/gi;
var matchesArray = testString.match(regExp);

This regular expression will find whole <pattern> and <failed> tags, either if they are empty tags or not (<empty/> or <notEmpty></notEmpty>). It also considers possible element attributes.

smnh
  • 1,735
  • 10
  • 15
  • Hi smnh, sorry I slightly edited my question, your regular expression worked perfectly well for my original requirement, any chance you can have a at the new text to be parsed? – Michael Z Jun 16 '10 at 00:04