0

Sometimes, I have to work with not well formed xml (without root). So I take the first node name (In this case "error") and make up a regex pattern: "</error>$" The problem is it matches true with the following string. (</error> is the end of the line)

<error>0</error>

<roles>
    <role rid="12" title="User" description="Hello world"></role>
    <role rid="11" title="Admin" description="Hello world2"></role></roles>

After looking at some reference I tried to do so </error>\z and </error>\Z. but it doesn't work. Please help me with the solution

P.S. If there are better solution, I'll be really happy. Target environment is javascript.

Mark Thomas
  • 37,131
  • 11
  • 74
  • 101
Oybek
  • 7,016
  • 5
  • 29
  • 49
  • Are you using the `m` flag in your regular expression? You should not be because that changes the meaning of `$` to match end of line instead of end of input. – Mike Samuel Jun 12 '11 at 15:19

2 Answers2

4

If the only thing that makes your XML not well-formed is that it's missing root, then the fix is simple – just add some root element and then parse that and work with it as normal XML.

xml = '<root>' + xml + '</root>';

You really shouldn't try to parse XML with regular expressions.

Community
  • 1
  • 1
svick
  • 236,525
  • 50
  • 385
  • 514
  • Well, I don't want to parse, I just want to check. The thing is I sometimes get respnse WITH sometimes WITHOUT root. When it is with two roots I don't want to add another root. Even if I do so i have to check then how many roots does it have )). – Oybek Jun 12 '11 at 13:19
  • Well, I think you should try to fix the server that's sending you invalid data. If that's not possible, then having to work with XML that may or may not have two “roots” is better than using regexes to work with XML. – svick Jun 12 '11 at 13:23
1

You are correct. Unfortunately, javascript does not support the \A and \Z anchors. Consider either using the substring of the last 10 characters with your regular expression (which would probably be more efficient.)

Otherwise, try this expression which matches if there is any non-whitespace after the </error> endtag.

[\s\S]*</error>(?=\s*\S)
agent-j
  • 27,335
  • 5
  • 52
  • 79