-1

I have a situation where someone delivers me a lot of regex in the XML flavor, and I need to use those regex to do some validation in Java and Javascript. What is the best why to handle those XML flavored regex, because there are differences between the XML flavor regex and Java flavor or ECMA(Javascript) flavor?

Example

Regex:

[A-z]

Java

"A" // true
"Ab" // false
"a" // true

Javascript

"A" // true
"Ab" // true
"a" // true
JordyOnrust
  • 652
  • 1
  • 6
  • 16

1 Answers1

1

You have already linked to a comparison table between XML- and ECMAScript-Style regexes, so you could easily figure out the differences.

There are some relevant differences:

  • \d and \w only match ASCII digits/alphanumerics in JavaScript (and Java).
  • JavaScript doesn't support Unicode character properties (\p{L} etc.) like XML and Java do.
  • Neither Java nor JavaScript support XML character escapes (like \i and \c) or character class subtraction ([a-z-[aeiou]]).

So if your XML regexes were to use any of those features, you wouldn't be able to convert them easily.

You can fix at least part of the problem by using Steve Levithan's XRegExp package with Unicode plug-ins to fix the Unicode issues. And in Java 7, you can switch on Unicode matching for \d and \w, so that should cover most of your potential issues.

However, there may be subtle implementation differences that aren't so obvious, so you'd definitely need to do some testing.

Community
  • 1
  • 1
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Thnks for your answer. The problem is that the XML regexes are coming from a third party. So I do not know of what kind of regexes I have to deal with till they are there. The software I am writing right now should be able to deal with those regexes. – JordyOnrust Dec 07 '12 at 12:59