2

I am trying to build a regex that disallows certain characters in a String in an Android application. That is, if any of the characters are not in a set of allowed characters, I should be able to know. My allowed characters are:

"a-zA-Z0-9æøåÆØÅ_ -"

The user types in a name which I want to check. My current take on this is:

filename.matches("^((?![a-zA-Z0-9æøåÆØÅ_ -]).)*$");

based on this answer. This regex returns false for all input, except if all characters are disallowed, which is not what I want. I also tried a simpler regex

filename.matches("([^a-zA-Z0-9æøåÆØÅ_ -])");

to try to match anything not in the capturing group, but this did not work as intended either.

What am I missing? Are there any quirks or special things in the Java regex engine in this particular case?

Examples

None of the regexes provided gives the desired result. Consider these examples. When the string contains both accepted and unaccepted characters, it fails to produce the proper result. The result is the same in Python. When pasting the two regexes below into https://regex101.com/, however, the latter seems to work as expected. It does not in reality though. I also tried adding capturing groups (i.e. parantheses) to the regexes, but to no avail.

String foo1 = "this_is_a_filename";
String foo2 = "this%is%not%a%filename";
String foo3 = "%+!?";

String regex1 = "^[^a-zA-Z0-9æøåÆØÅ_ -]+$";
String regex2 = "[^a-zA-Z0-9æøåÆØÅ_ -]+";

boolean isMatch;

isMatch = foo1.matches(regex1); // false, ok
isMatch = foo2.matches(regex1); // false, should be true
isMatch = foo3.matches(regex1); // true, ok

isMatch = foo1.matches(regex2); // false, ok
isMatch = foo2.matches(regex2); // false, should be true
isMatch = foo3.matches(regex2); // true, ok
Community
  • 1
  • 1
Krøllebølle
  • 2,878
  • 6
  • 54
  • 79

3 Answers3

3

You need a regex to match a character that is not valid. That's the negation of the set of allowed chars: [^a-zA-Z0-9æøåÆØÅ_ -]

And use Matcher#find method:

public boolean find()
Attempts to find the next subsequence of the input sequence that matches the pattern.

Returns: true if, and only if, a subsequence of the input sequence matches this matcher's pattern

Example:

String foo1 = "this_is_a_filename";
String foo2 = "this%is%not%a%filename";
String foo3 = "%+!?";

String regex = "[^a-zA-Z0-9æøåÆØÅ_ -]";

Pattern p = Pattern.compile(regex);
System.out.println("Foo1: " + p.matcher(foo1).find());
System.out.println("Foo2: " + p.matcher(foo2).find());
System.out.println("Foo3: " + p.matcher(foo3).find());

Output:

Foo1: false
Foo2: true
Foo3: true

Ideone demo: https://ideone.com/S36DYF

Tobías
  • 6,142
  • 4
  • 36
  • 62
1

Don't use String#matches as it attempts to match full input. Use this string in Pattern and Matcher APIs:

String regex = "[\\wæøåÆØÅ -]"; 
Pattern p = Pattern.compile(regex);

Matcher m = p.matcher(str);

if (m.find())
    System.out.println("Input has at least one valid char");

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

Your have to use full string match using ^ and $ in the two end of the string as follows: filename.matches("^[^a-zA-Z0-9æøåÆØÅ_ -]+$");

  • Well, I suppose you could use the `^` and `$` anchors, but it should not matter in this case as far as I am aware. I updated my post with some examples of what worked and what didn't. – Krøllebølle Oct 30 '15 at 07:42