8

I would like to use a regular expression like this in Java : [[=a=][=e=][=i=]].

But Java doesn't support the POSIX classes [=a=], [=e=] etc.

How can I do this? More precisely, is there a way to not use US-ASCII?

tchrist
  • 78,834
  • 30
  • 123
  • 180
Stephan
  • 41,764
  • 65
  • 238
  • 329

3 Answers3

15

Java does support posix character classes. The syntax is just different, for instance:

\p{Lower}
\p{Upper}
\p{ASCII}
\p{Alpha}
\p{Digit}
\p{Alnum}
\p{Punct}
\p{Graph}
\p{Print}
\p{Blank}
\p{Cntrl}
\p{XDigit}
\p{Space}
Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
  • US ASCII only. Is there a way to use some locale ? – Stephan Jul 07 '11 at 15:20
  • @Stephan, unfortunately no way that I know of. You can always match [unicode characters](http://stackoverflow.com/questions/917774/java-regex-support-for-non-ascii-values) manually though to create your own character groups. – Johan Sjöberg Jul 07 '11 at 15:25
6

Quoting from http://download.oracle.com/javase/1.6.0/docs/api/java/util/regex/Pattern.html

POSIX character classes (US-ASCII only)

\p{Lower}   A lower-case alphabetic character: [a-z]
\p{Upper}   An upper-case alphabetic character:[A-Z]
\p{ASCII}   All ASCII:[\x00-\x7F]
\p{Alpha}   An alphabetic character:[\p{Lower}\p{Upper}]
\p{Digit}   A decimal digit: [0-9]
\p{Alnum}   An alphanumeric character:[\p{Alpha}\p{Digit}]
\p{Punct}   Punctuation: One of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}   A visible character: [\p{Alnum}\p{Punct}]
\p{Print}   A printable character: [\p{Graph}\x20]
\p{Blank}   A space or a tab: [ \t]
\p{Cntrl}   A control character: [\x00-\x1F\x7F]
\p{XDigit}  A hexadecimal digit: [0-9a-fA-F]
\p{Space}   A whitespace character: [ \t\n\x0B\f\r]
ahmet alp balkan
  • 42,679
  • 38
  • 138
  • 214
  • I think POSIX also allows only ASCII, am I wrong? That must be a side note for users expecting posix to handle unicode. – ahmet alp balkan Jul 07 '11 at 15:24
  • On Oracle, they have implemented their regex flavor by following POSIX spec. They accept the special class [= =]. I didn't verify if the class adpats for the various locales Oracle supports though. – Stephan Jul 07 '11 at 15:35
  • The posix specification does support different locales with *collation equivalence classes* described under point seven of the Posix Specification for Regular Expressions: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03_05 – djhaskin987 Jul 25 '12 at 21:30
2

Copied from here

Java does not support POSIX bracket expressions, but does support POSIX character classes using the \p operator. Though the \p syntax is borrowed from the syntax for Unicode properties, the POSIX classes in Java only match ASCII characters as indicated below. The class names are case sensitive. Unlike the POSIX syntax which can only be used inside a bracket expression, Java's \p can be used inside and outside bracket expressions.

Amir Raminfar
  • 33,777
  • 7
  • 93
  • 123