0

I'm using a JavaScript regular expression /(<mos>[\s\S]*?<\/mos>)/g to find XML blocks in a log file that looks roughly like this:

Entry 1: <mos>...</mos>
Entry 2: <mos>...</mos>

However, sometimes the logging process encounters an error and doesn't finish writing an entry to the file, in which case it looks like this:

Entry 1: <mos>Error!
Entry 2: <mos>...</mos>

When this happens the regular expression matches everything from the opening <mos> tag in entry 1 to the closing </mos> tag in entry 2 which causes problems when processing the XML later.

It seems that somehow matching the closing tags first and then looking back for their corresponding opening tags would avoid this, but I don't know how to do this or if it is possible with regular expressions.


Clarification: The ... in the blocks delimited by the start and end tags can include newlines.

Alex
  • 199
  • 1
  • 13
  • Why would you match on `[\s\S]*` when what you want is "everything up to either `<` or `\n`"? – Mike 'Pomax' Kamermans Sep 29 '14 at 14:59
  • "You can't parse [X]HTML with regex" ..or XML for that matter: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – gion_13 Sep 29 '14 at 15:01
  • @Mike'Pomax'Kamermans The `...` in the entries can span multiple lines. I've updated my question to include that. – Alex Sep 29 '14 at 15:39
  • 1
    @gion_13 It's not XML, but a log file containing XML (http://meta.stackoverflow.com/questions/261561/please-stop-linking-to-the-zalgo-anti-cthulhu-regex-rant) `;)`. – sp00m Sep 29 '14 at 15:58

1 Answers1

2

This one should suit your needs:

<mos>((?:[\s\S](?!<mos>))+?)</mos>

Regular expression visualization

Visualization by Debuggex

Demo on RegExr


Don't forget to escape the slashes if using a JS regex literal.

sp00m
  • 47,968
  • 31
  • 142
  • 252