1

I'm having trouble setting up a RegEx matcher in Android environment.

My String pattern:

private static final String INVALID_PATTERN = "/[^а-яa-z0-9\\s,!\\-_{\\}\\[\\];+]/ig";

Unescaped pattern (matches everything, but cyrillic and latin letters, numbers, space, comma, exclamation mark, minus, underscore, square brackets, semicolon and plus globally ignoring case; I consider those "legal"):

/[^а-яa-z0-9\s,!\-_\[\];+]/ig

My code:

public static ErrorType createStory(@NonNull String name){
    Matcher m = Pattern.compile(INVALID_PATTERN).matcher(name);
    if(m.matches()){
        Log.e("Error", "Story name '" + name + "' contains illegal characters.");
        return ErrorType.ILLEGAL;
    }
    //...
}

This, however, neither throws any errors nor does work.

What I tried so far and didn't work (where string is a String variable):

  • string.matches(pattern)
  • Pattern.compile(pattern).matcher(string).matches()
user7401478
  • 1,372
  • 1
  • 8
  • 25

1 Answers1

1

You need to use

private static final String INVALID_PATTERN = "(?i)[а-яёa-z0-9\\s,!_{}\\[\\];+-]+";

and use it as

public static ErrorType createStory(@NonNull String name){
    Matcher m = Pattern.compile(INVALID_PATTERN).matcher(name);
    if(!m.matches()){
        Log.e("Error", "Story name '" + name + "' contains illegal characters.");
        return ErrorType.ILLEGAL;
    }
    //...
}

Explanation:

  • The (?i)[а-яёa-z0-9\\s,!_{}\\[\\];+-]+ pattern matches the specified ranges and chars in a case-insensitive way (due to the embedded flag option (?i)), 1 or more occurrences
  • Since the regex matches a valid string, if (!m.matches()) is used to only show the error if the regex does not match the string
  • As .matches() requires a full string match, no ^ and $ anchors are necessary in the pattern
  • In Android regex, regex delimiters are not used, and the way you pass regex options is either via Pattern.<FLAG> or via inline modifiers (as, e.g. (?i))
  • Judging by the range of Cyrillic letters, you want to match Russian letters, but а-я does not include ё, that is why I included it into the character class
  • Always put the hyphen at the start or end of the character class, it will always be parsed as a literal - symbol. It is best practice, and will work in any regex flavor (if placed at the start - with any flavor I know).

If you want to use a negative approach, use

private static final String INVALID_PATTERN = "(?i)[^а-яёa-z0-9\\s,!_{}\\[\\];+-]";

and in the code, use if (m.find())

public static ErrorType createStory(@NonNull String name){
    Matcher m = Pattern.compile(INVALID_PATTERN).matcher(name);
    if(m.find()){
        Log.e("Error", "Story name '" + name + "' contains illegal characters.");
        return ErrorType.ILLEGAL;
    }
    //...
}

Then, the error will be shown if the chars other than those defined in the negated character class are present in the string. .find() does not require a full string match, it allows partial matches.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Is Matcher/Pattern solution essentially the same as string.matches() but with `^...$`? – user7401478 Apr 30 '17 at 15:31
  • See my second part. You should declare the regex as a string, not as a regex literal (used in Ruby, JavaScript) and make sure you use appropriate methods: `Matcher.find()` searches for a match anywhere inside the string, and `Matcher.matches()` only tries to match the whole string against the pattern. – Wiktor Stribiżew Apr 30 '17 at 15:35
  • 1
    One more thing to note: `ё` does not belong to `а-я` range, I added it. – Wiktor Stribiżew Apr 30 '17 at 15:40
  • You should explain that using ``\\-`` is perfectly valid, but that *you* prefer to move it to the end so you can omit the escape characters, and that it is *important* that the `-` is at the end, and why that is so. – Andreas Apr 30 '17 at 16:02
  • Whenever one wants to use a literal `-` inside a character class, it is best practice to use it at the start or end of the character class. Here is [an illustration](http://stackoverflow.com/a/4068725/3832970). – Wiktor Stribiżew Apr 30 '17 at 16:05
  • The advantage of the negative `find()` over the positive `matches()`, is that `find()` will tell you the location and value of the offending character, while `matches()` won't. – Andreas Apr 30 '17 at 16:05
  • @WiktorStribiżew I meant that you should *edit* the answer and explain, without silently making the change, especially because it's potentially dangerous if someone who doesn't know about `-` adds extra characters to the end of the character class. – Andreas Apr 30 '17 at 16:07