2

Using Java

I am not a regular on regex, I came across the following regex as part of migration of springmodules-validation stuff to latest.

^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\\]^_`{|}~]+$

What exactly is this doing? I need to understand this to write unit test to this validation. By the way I'm using it in a Java project.

One more interesting thing, I tried this expression in hibernate-validator as follows:

@Pattern(regexp = "^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\\]^_`{|}~]+$")

Then my intellijIDEA shows an error at the end of the line saying Unclosed character class. is the regex expression is properly formed?

Update

It seems the expression is malformed, I see the following exception while trying to test this:

java.util.regex.PatternSyntaxException: Unclosed character class near index 57
^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\]^_`{|}~]+$

Here is the original expression from one of the xml files which I'm trying to migrate:

<regexp apply-if="creativeType == 'Text'" expression="^[a-zA-Z0-9 

&quot;&apos;&amp;!#$%()*+,-./:;?@[\\]^_`{|}~]+$"/>

Am I missing anything?

Working Solution

regexp = "^[a-zA-Z0-9 \"'&!#$%()*+,-./:;?@\\[\\]^_`{|}~]+$"

this way I have assigned it to a string and which works perfectly for me Thank you all!

Diablo
  • 443
  • 7
  • 21
  • Character entities like " ' etc. are not interpreted in regex. Is this extracted from some xml or so? You have to translate them to the original character. – Uwe Allner Nov 12 '14 at 11:21
  • looks like I should use `"'&` these in xml/xhtml and in my expression I should replace these. Is this the problem? if so do I have to escape this with \ ? – Diablo Nov 12 '14 at 11:23
  • @UweAllner that's a good point. How can I replace them? does all of these need to be escaped ? or only the `&quot`? – Diablo Nov 12 '14 at 11:24
  • You will have to use " instead of " when you use the expression outside of xml files (e.g. annotations). Otherwise the characters &, q, u, o, t and ; are tried to be matched. – Uwe Allner Nov 12 '14 at 11:28
  • @UweAllner Yup, sorry I forgot to update it. Your solution worked like a charm. Thanks! – Diablo Nov 12 '14 at 13:13

2 Answers2

4

The translated expression would look something like

^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@\[\]^_`{|}~]+$

and means a line of letter, digits and a set of other characters (like different brackets, where ] has to be escaped for not meaning the end of a character class).

Uwe Allner
  • 3,399
  • 9
  • 35
  • 49
  • 4
    Remember that you need to be careful if you want to include a hyphen in a character class, it generally needs to be at the beginning or end of the class as anywhere else it denotes a range. In this _specific_ case it will actually work - `",-."` is treated as a range expression (from comma to dot inclusive) but the only character whose code lies between those of comma and dot is... hyphen! – Ian Roberts Nov 12 '14 at 11:35
  • Seems to be some trick of the creator of this expression (or just pure coincidence). I just translated it, but thanks for the remark! Its easy to overlook. – Uwe Allner Nov 12 '14 at 11:39
  • @UweAllner! I tried it in my java program, as `pattern = Pattern.compile("^[a-zA-Z0-9 \"'&!#$%()*+,-./:;?@[\\]^_{|}~]+$"); boolean isValid = pattern.matcher(targetStringToValidate).matches();` but still having the same `java.util.regex.PatternSyntaxException: Unclosed character class` I think there is problem with string escape characters in java. pls help! – Diablo Nov 12 '14 at 12:04
  • @Diablo Oh sorry; of course the opening square bracket has to be escaped, too. I have fixed it in my answer. – Uwe Allner Nov 12 '14 at 12:05
0

You can use something like YAPE::Regex::Explain in Perl or RegexBuddy to get a detailed description of your regular expression. A messy one-liner can be found below:

perl -MYAPE::Regex::Explain -e \
'$e=<>; print YAPE::Regex::Explain->new($e)->explain';

After providing the regexp from stdin:

The regular expression:

^[a-zA-Z0-9 "'&!#$%()*+,-./:;?@[\]^_`{|}~]+$

matches as follows:

NODE                       EXPLANATION
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  ^                        the beginning of the string
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  [a-zA-Z0-9               any character of: 'a' to 'z', 'A' to 'Z',
  "'&!#$%()*+,-             '0' to '9', ' ', '"', ''', '&', '!', '#',
  ./:;?@[\]^_`{|}~]+       '$', '%', '(', ')', '*', '+', ',' to '.',
                           '/', ':', ';', '?', '@', '[', '\]', '^',
                           '_', '`', '{', '|', '}', '~' (1 or more
                           times (matching the most amount possible))
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
  $                        before an optional \n, and the end of the
                           string
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Using something like Regex Buddy will let you select a Java flavor for your regular expression, but it should be pretty standard in this case.

Are you sure this is Java though? From all that escaping it looks a lot more like it's part of a XSD / XPath / XML thing.

Criveti Mihai
  • 321
  • 1
  • 3
  • Yes I'm refactoring legacy xml config to java annotations, where I need this expression as a Java String – Diablo Nov 12 '14 at 12:17
  • Ah, makes sense - I guess you don't need the XML escapes in Java then. See: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML - so & becomes & and so on. – Criveti Mihai Nov 12 '14 at 12:19
  • YAPE::Regex::Explain won't help here because the problem is specific to Java, which always treats an unescaped `[` as the beginning of a nested character class. RegexBuddy correctly flags the error, assuming you specify Java as the flavor and reduce the double backslash. (In fact, I wonder why it *was* doubled; backslash has no special meaning in XML.) – Alan Moore Nov 12 '14 at 13:25