-1

I am having problems constructing a regex that will allow the full range of UTF-8 characters with the exception of 2 characters: _ and ?

So the whitelist is: ^[\u0000-\uFFFF] and the blacklist is: ^[^_%]

I need to combine these into one expression.

I have tried the following code, but does not work the way I had hoped:

var input = "this%";
var patrn = /[^\u0000-\uFFFF&&[^_%]]/g;
if (input.match(patrn) == "" || input.match(patrn) == null) {
    return true;
} else {
    return false;
}

input: this%

actual output: true

desired output: false

OneXer
  • 303
  • 9
  • 20

3 Answers3

1

Use negative lookahead:

(?!_blacklist_)_whitelist_

In this case:

^(?:(?![_%])[\u0000-\uFFFF])*$
ndnenkov
  • 35,425
  • 9
  • 72
  • 104
1

If I understand correctly, one of these should be enough:

/^[^_%]*$/.test(str);
!/[_%]/.test(str);
Oriol
  • 274,082
  • 63
  • 437
  • 513
1

Underscore is \u005F and percent is \u0025. You can simply alter the range to exclude these two characters:

^[\u0000-\u0024\u0026-\u005E\u0060-\uFFFF]

This will be just as fast as the original regex.


But I don't think that you are going to get the result you really want this way. JS can only go up to \uFFFF, anything past that will be two characters technically.

According to here, the following code returns false:

/^.$/.test('')

You need to have a different way to see if you have characters outside that range. This answer gives the following code:

String.prototype.getCodePointLength= function() {
    return this.length-this.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g).length+1;
};

Simply put, if the number returned by that is not the same as the number returned by .length() you have a surrogate pair (and thus you should return false).

If your input passes that test, you can run it up against another regex to avoid all the characters between \u0000-\uFFFF that you want to avoid.

Community
  • 1
  • 1
Laurel
  • 5,965
  • 14
  • 31
  • 57
  • thanks, this is a qood quick fix, but this might not be maintainable if the blacklist grows. I really need to understand how to combine the whitelist and the blacklist expressions. It seems to work in java, but does not behave the same in javascript – OneXer May 09 '16 at 15:27
  • 1
    @OneXer I have updated my answer. I think the problem is the approach you are taking. – Laurel May 09 '16 at 16:10