Regular expression for unicode in java 7

Question

I have regular expression to validate number digits and -. I am now supporting mutibyte characters as well. So I have used unicode class to support but Its not matching. Can some one enlighten me on this

public class Test123 {

    public static void main(String[] args) {

        String test="熏肉еконcarácterbañlácaractères" ;
        Pattern pattern = Pattern.compile("^[a-zA-Z0-9_-]*$",Pattern.UNICODE_CHARACTER_CLASS);

        Matcher matcher = pattern.matcher(test);
        if(matcher.matches())
        {
            System.out.println("matched");
        }
        else{
            System.out.println("not matched");
        }
    }

}

See the documentation for Pattern which has a full supply of Unicode character classes. — bmargulies, Aug 10 '15 at 10:26

score 4 · Answer 1 · answered Aug 10 '15 at 10:26

You can use the posix class \\p{Alpha}, instead of literal classes with [a-zA-Z] to match unicode and accented characters.

Example

String test = "熏肉еконcarácterbañlácaractères";
Pattern pattern = Pattern.compile("\\p{Alpha}+", Pattern.UNICODE_CHARACTER_CLASS);
Matcher m = pattern.matcher(test);
while (m.find()) {
    System.out.println(m.group());
}

Output

熏肉еконcarácterbañlácaractères

Pshemo · Answer 2 · 2015-08-10T11:04:55.927

1

Problem is that despite that flag a-z doesn't represent "all Unicode alphabetic characters" but only "characters between a and z".

UNICODE_CHARACTER_CLASS flag adds Unicode support only to predefined character classes like \w which normally represents a-zA-Z0-9_.

So try with

Pattern.compile("^[\\w-]*$",Pattern.UNICODE_CHARACTER_CLASS);

edited Aug 10 '15 at 11:04

answered Aug 10 '15 at 10:34

Pshemo

122,468
25
185
269

score 0 · Answer 3 · answered Aug 10 '15 at 11:03

[\\p{L}\\p{M}]+

You can use this to match unicode letters.

\p{L} matches any kind of letter from any language
\p{M} matches a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)

See demo.

https://regex101.com/r/fM9lY3/30

Regular expression for unicode in java 7

3 Answers3